cover
Full Time

Site Reliability Engineer - 100% Remote/ 1 week ago

Digistore24
Attractive
Application ends: 2026-03-25

Quick Summary

Digistore24 seeks a remote Site Reliability Engineer with over 3 years of IT operations experience and mandatory fluency in English and German. This role focuses on enhancing system reliability through automation (IaC, CI/CD with GitHub Workflows, Helm, Kustomize), performance optimization, capacity planning, and incident response using tools like Prometheus, Grafana, or ElasticSearch. Required technical skills include Kubernetes/container technology and cloud services (Google Cloud preferred); PHP experience is a plus. The position demands strong communication, collaboration, problem-solving, and self-organization, offering flexible full-time hours.

Are you an experienced developer or DevOps engineer? Do you want the freedom to work remotely and grow in the new field of site reliability at an internationally successful software and education company? Take our reliability to the next level as part of our Site Reliability Engineering team.

Please note: English and German language proficiency is a MUST for this position. Do not apply if you do not speak both languages.

Who is Digistore24?

We are one of the fastest-growing tech companies in Europe.

What drives us? We shape the digital future! Our mission is to empower people with our software and expertise to share their knowledge online, enabling them to fulfill their dream of an own business. As a result, millions of people gain access to information that helps them reach their goals. To keep pace with our growth, we aim to expand our teams sustainably. We emphasize working with experts and strong personalities who share our values – regardless of their location.

Your New Dream Job: Key Responsibilities

  • Automation and Infrastructure as Code (IaC): Automate repetitive tasks, deployments, and system management to reduce human error and improve efficiency. This includes creating scripts, CI/CD pipelines, or automating infrastructure provisioning.
  • Reliability and Performance Optimization: Continuously improve system uptime by identifying bottlenecks and optimizing system architecture.
  • Capacity Planning and Scaling: Assess and predict system resource requirements (CPU, memory, storage) to ensure infrastructure scales with increasing demand. Implement auto-scaling solutions to handle load spikes without human intervention, ensuring systems remain performant under various conditions.
  • System Monitoring and Incident Response: Continuously monitor system performance, uptime, and reliability using tools like Prometheus, Grafana, or ElasticSearch. Detect and respond to issues before they impact users. Manage and respond to incidents, outages, and failures quickly, aiming to minimize downtime. This includes managing incident documentation, communication, and post-incident analysis.
  • Incident Postmortems and Continuous Improvement: Conduct root cause analysis (RCA) after incidents to identify what went wrong and how to prevent similar issues. Implement fixes, improvements, and best practices based on learnings from postmortems to increase system reliability and reduce future incidents.

Your Benefits at Digistore24

  • Play a crucial role in shaping cutting-edge projects in a collaborative work environment, with flexibility in working time and location.
  • Work in our partner's coworking spaces or from your home office, provided you have uninterrupted internet access.
  • Regular opportunities for further education.
  • Enjoy the stability of an extremely successful German high-tech company, funded by its product, not by investors.
  • Join outcome-focused teams with a culture of direct feedback.
  • Receive modern equipment: Thinkpad or MacBook.
  • Be part of an international, collaborative team with strong cohesion.
  • Participate in spectacular team events in various European countries.
  • Experience autonomy from day one.
  • Contribution to the retirement scheme.
  • Work in your team on a first-name basis, without a dress code, and at eye level.
  • Flexible working hours from Mondays to Fridays (core working hours from 10 AM to 4 PM).

Key Skills & Qualities for this Role at Digistore24

  • Communication Mastery: Communicate precisely and in a recipient-friendly manner. Diffuse potential conflicts with sensitivity and a solution-oriented approach. Strike the right tone with stakeholders, developers, and your team, even under time pressure, and seamlessly switch between German and English if necessary.
  • Collaboration Wizardry: Collaborate effectively with developers, stakeholders, and operations to align everyone. Understand challenges across different teams and find company-wide solutions.
  • Automation Sorcery: Promote automation to save time and reduce errors, implementing tools that improve team productivity.
  • Problem-Solving Genius: Dive deep into problems, identify root causes, and develop solutions that prevent future incidents.
  • Self-organization: Thrive on autonomy and excel at organizing and structuring complex projects while working remotely.

Tech Stack:

  • Kubernetes / Container Technology
  • CI/CD (Github Workflows, Helm, Kustomize)
  • Cloud Services (preferably Google, but others are also acceptable)
  • Excellent spelling and grammar in German
  • PHP language experience (a plus)

A Typical Day at Digistore24

Start your day with a morning video call to discuss yesterday's progress and today's plans with your team.

You prefer a structured approach, outlining your daily routine and goals. You allocate sufficient time for the continuous development of our SRE processes, supported by your team.

During the daily team call, you report on priorities and blockers, receiving tangible tips to overcome challenges.

For several hours, you focus on developing ideas for improvements in auto-scaling, monitoring, and alerting, turning off messengers for uninterrupted concentration. You then test these ideas in practice and document successful principles to present to the Head of IT Operations in a one-on-one call.

After your lunch break, you assist a developer with a new CI/CD workflow, discussing requirements and providing an initial prototype.

You address a ticket to check an application's resource allocation, reviewing current utilization and adjusting the deployment as needed.

Upon discovering an endpoint not yet included in monitoring, you create a ticket and immediately write the necessary code in the Terraform project to add it.

This Position is Not For You If:

  • You do not identify with our company values.
  • You have less than 3 years of experience in IT operations.
  • You cannot take ownership and require detailed discussions with supervisors or colleagues for every decision.
  • You have difficulty planning and prioritizing your tasks.
  • You do not enjoy finding solutions for complex problems.
  • You are not confident speaking both German AND English.

Our Values

Review our values here: https://careers.digistore24.com/kultur-und-werte

Please take a really close look at these values. Are you ready to live them?

Share

Digistore24

Digistore24

  • Address
    London, England
View Profile
Your experience on this site will be improved by allowing cookies Cookie Policy