Quick Summary
If you are passionate about solving problems and developing solutions, and want to work with cutting-edge technologies in a dynamic environment, this position is for you!
We are a fast-growing scale-up with incredible results, still with much to be structured, created, tested, adapted, and expanded. This is a great opportunity to put your knowledge into practice, be highly recognized, and leave your mark on our history (and a significant one on your resume). We are pursuing ambitious results for the coming years, and achieving them depends only on us – working together and closely.
We are a company of innovative, creative, and diverse people, so we are super open to welcoming you with your diversity.
What we value here:
- Love for change;
- Results;
- Proximity;
- Generating fans;
- Objectivity;
- Self-responsibility.
Your daily activities will include:
- Monitoring alert channels (Google Chat, Dynatrace, Grafana, Kibana, and others), identifying anomalies, validating their legitimacy, and notifying responsible parties or taking corrective actions;
- Utilizing observability tools (Dynatrace, Grafana, Kibana, OpenSearch) to investigate logs, metrics, and traces in search of root causes;
- Acting as technical support for development team demands, investigating bugs and unexpected behaviors in production and homologation environments;
- Participating in war rooms and incidents, supporting real-time triage, communication, and documentation;
- Monitoring deployments and validating service health post-release, identifying regressions or degradations;
- Collaborating with development teams in analyzing problems involving application and infrastructure;
- Creating and maintaining runbooks, checklists, and incident documentation, contributing to the team's knowledge base;
- Supporting the construction and refinement of dashboards and alerts to improve observability coverage.
Our technology stack
- Cloud: AWS (100% Cloud)
- Orchestration: Kubernetes, Docker
- Observability: Dynatrace, Grafana, Kibana, OpenSearch, Prometheus, CloudWatch
- CI/CD: GitHub Actions (preferred), Jenkins, Bitbucket CI
- Version Control: GitHub (preferred), Bitbucket
- Containerization: Docker
- Databases: PostgreSQL, MySQL, RDS
- Languages: Python, PHP, Java, Bash
- Messaging: Kafka, SQS
What you need to know?
Observability & Troubleshooting
- Ability to read and interpret logs, metrics, and traces — understanding what the data is saying even without someone pointing it out;
- Basic or intermediate experience with tools like Dynatrace, Grafana, Kibana, OpenSearch, CloudWatch, or similar;
- Ability to correlate events: understanding that a latency spike, a 500 error, and a deploy at 2 PM can be the same story;
Development & Automation
- Knowledge of Python or Bash for automation scripts, data analysis, or investigation support;
- Familiarity with Git (commits, branches, PRs) — you will work alongside developers;
- Basic understanding of how web applications work: REST APIs, HTTP/HTTPS, status codes, authentication;
- Willingness to read code even without being a developer — understanding an application's flow greatly aids bug investigation;
Infrastructure
- Familiarity with Linux and command line in daily use;
- Understanding of Docker and containers;
- Basic knowledge of networking: DNS, load balancing, HTTP/HTTPS protocols;
- AWS is a plus — no need to be an expert, but knowing how to navigate the console helps;
Soft Skills (as important as technical skills)
- Investigative profile: you don't rest until you understand the

