Quick Summary
Senior Site Reliability Engineer (SRE) - Global Infrastructure & Kubernetes
At Laravel, we empower millions of developers. We are seeking a Senior Site Reliability Engineer (SRE) to ensure our global infrastructure is reliable, scalable, and elegant. If you thrive on managing multi-region Kubernetes clusters, building robust observability systems, and solving complex operational challenges through code, join us to build the foundation for Laravel Cloud, Nightwatch, Forge, and Vapor.
Role Overview:
As a founding member of our dedicated SRE function, reporting directly to Florian Beer, this is a high-impact, autonomous role. You will design and implement critical systems, acting as a bridge between development and operations, fostering a blameless culture and shared responsibility for reliability across the organization.
Your 12-Month Mission Highlights:
- First 30 Days: Stabilize incident response by creating comprehensive, actionable runbooks for core alerts.
- Day 60: Pioneer "observability as code" by migrating alert rules and dashboards into version control.
- Day 90: Establish clear, data-driven SLOs (Service Level Objectives) for all customer-facing products.
- Year One: Transform system visibility with insightful dashboards and significantly reduce manual toil through sophisticated automation.
Key Responsibilities:
- Architect Reliability: Establish SRE fundamentals and best practices from the ground up at Laravel.
- System Design: Design, build, and maintain multi-region Kubernetes infrastructure and global distributed systems.
- Automation: Solve operational challenges using software, minimizing manual intervention (toil) for product teams.
- Observability: Design and implement advanced monitoring, logging, and alerting systems using tools like Prometheus, Grafana, and Loki.
- Collaboration: Partner with product leads and SecOps to ensure reliability is a shared organizational responsibility.
- Incident Response: Lead incident reviews and postmortems in a strictly blameless environment to foster continuous learning.
Required Skills & Experience:
- Infrastructure Mastery: Deep experience with Linux system administration and cloud platforms, specifically AWS.
- Orchestration & IaC: Proficiency with Kubernetes, Docker, and managing infrastructure via Terraform.
- Programming Skills: Ability to solve problems with software and scripting using PHP, Bash, or Go.
- Systems Thinking: A passionate approach to troubleshooting, capable of deconstructing complex systems into triagable components.
- Reliability Mindset: Experience defining and implementing SLO/SLI/SLA, capacity planning, and performance tuning.
- Soft Skills: Commitment to documentation, cross-team collaboration, and an automation-first mindset.
Bonus Skills:
- Framework Familiarity: Previous experience working with the Laravel framework or our existing product suite (Cloud, Forge, Vapor, etc.).
- Advanced Observability: Experience with Prometheus and Grafana Mimir for metrics storage and alerting.
- Cost Optimization: Specialized knowledge in managing and optimizing resource usage and cloud costs.
Benefits:
- Small, tight-knit team where every developer counts.
- Fully remote and globally distributed working environment.
- Option to attend Laracon conferences around the world.
- Health care plan (Medical, Dental & Vision).
- Paid time off (Vacation, Sick & Public holidays).
- Family leave (Maternity, Paternity).
- Pension plans (As locally applicable).
- Performance based bonus plan.
- Company equity.

