Five9

Site Reliability Engineer (SRE)/ 3 months ago

Five9

$71,800 - $190,000 /yearly

Application ends: 2026-02-05

Apply Now

Remote Site Reliability Engineer role balancing 50% software development and 50% operational expertise to ensure high reliability and scalability for a cloud contact center platform. Key responsibilities include defining SLIs/SLOs, managing 24/7 on-call incident response, maintaining CI/CD, developing Infrastructure as Code (Terraform/Ansible), and optimizing cloud costs. Requires 3+ years managing large-scale production environments, strong Linux/Unix skills, proficiency in two programming languages (Python, Shell, PHP, Java, or similar), and hands-on experience with Docker, Kubernetes, and a major cloud platform (AWS, GCP, or Azure). Salary range is $71,800 - $190,000 USD annually.

Site Reliability Engineer (SRE) - Cloud Contact Center Software

Five9 is a leading provider of cloud contact center software, committed to bringing the power of cloud innovation to customers worldwide. We foster a team-first culture that celebrates diversity and empowers employees to thrive.

We are seeking a Site Reliability Engineer (SRE) to join our team and ensure the maintenance of highly reliable, scalable systems. This role balances approximately 50% software development with 50% operational expertise, focusing heavily on automation, monitoring, and system reliability rather than manual operations. You will collaborate closely with platform, application, and database teams to deliver reliable and available service.

Key Responsibilities: SRE Focus Areas

Observability & Monitoring

Design and implement comprehensive dashboards covering OS/platform and application-level monitoring (using primary RED and secondary USE indicators).
Establish and maintain Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets.
Build alerting systems and performance monitoring to proactively identify and resolve issues.
Participate in 24/7 on-call rotations, lead incident response efforts, including post-mortem analysis and remediation, and maintain official on-call routing.

Infrastructure Automation & Deployment

Maintain Continuous Integration/Continuous Deployment (CI/CD) pipelines, working with cloud and on-premise deployment teams.
Develop and maintain Infrastructure as Code (IaC) using tools like Terraform or Ansible.
Automate system configuration, ensuring consistency across environments, and implementing configuration control best practices.

Security & Compliance

Implement security automation, ensuring scanning systems are in place and reviewing escalated vulnerabilities.
Maintain proper authentication, authorization, and audit logging systems.
Ensure systems meet regulatory requirements and industry standards through compliance reporting.
Participate in security incident response and remediation efforts.

Cost Optimization

Monitor and optimize cloud resource usage and costs, looking for planned and unplanned resource changes.
Analyze usage patterns for capacity planning.
Provide recommendations for cost-effective architecture and implement automated scaling and resource optimization strategies (right-sizing).

Common Services & Platform Engineering

Build and maintain shared infrastructure such as notification systems, caching layers, message queues, or third-party software stacks.
Manage database reliability, performance, and scaling (where not handled by dedicated DB teams).
Implement and maintain service discovery, load balancing, and network policies (Service Mesh & Networking).
Create and maintain tools and platforms that improve developer productivity and system reliability.

Required Qualifications

Operational Experience

3+ years managing large-scale production environments.
Comfortable with 24/7 on-call responsibilities and incident response.
Strong Linux/Unix system administration skills.
Understanding of networking concepts: TCP/IP, DNS, load balancing, and network security.
Experience with SQL and NoSQL databases in production environments.

Technical Skills

Proficiency in at least two programming languages: Python, Shell, PHP, Java, or similar.
Experience with one major cloud platform infrastructure and services (AWS, GCP, or Azure).
Hands-on experience with Docker, Kubernetes, and container orchestration.
Experience with Monitoring & Observability tools (Prometheus, Grafana, ELK stack, or similar).
Proficiency with Infrastructure as Code tools (Terraform, CloudFormation, or similar).
Expert-level Git usage and collaborative development practices.

SRE-Specific Knowledge

Experience defining and maintaining SLI/SLO.
Understanding of error budget concepts and implementation.
Track record of identifying and eliminating repetitive manual work (toil reduction).
Experience with performance testing and capacity management.

Preferred Qualifications

Bachelor's degree in Computer Science, Engineering, or equivalent experience.
Experience with microservices architecture and distributed systems.
Knowledge of security best practices and compliance frameworks.
Experience with chaos engineering and reliability testing.
Previous experience in an SRE or DevOps role at a technology company.
Contributions to open-source projects or technical communities.

Five9 is an equal opportunity employer.

Job Information

82 Views
1 Applicants

Date Posted
2026-01-06
Location
Remote
Industry
Custom PHP Developer Jobs
Offered Salary
$71,800 - $190,000 /yearly

Job Skills

Five9

Address
Remote

View Profile

Site Reliability Engineer (SRE)

Five9

Site Reliability Engineer (SRE)/ 3 months ago

Quick Summary

Site Reliability Engineer (SRE) - Cloud Contact Center Software

Key Responsibilities: SRE Focus Areas

Observability & Monitoring

Infrastructure Automation & Deployment

Security & Compliance

Cost Optimization

Common Services & Platform Engineering

Required Qualifications

Operational Experience

Technical Skills

SRE-Specific Knowledge

Preferred Qualifications

Share

Job Information

Job Skills

Job Tags

Five9