Quick Summary
Dropsolid is hiring a full-time Infrastructure Engineer to build and run its multi-tenant Digital Experience Platform on Google Cloud. Working fully remotely within the European Economic Area, you will manage infrastructure-as-code using Ansible, maintain and patch the Linux fleet, handle production incidents during office hours, and improve observability. The role requires at least 4 years of experience operating Linux in production (Debian/Ubuntu), strong Ansible skills, and solid networking and security fundamentals.
Dropsolid runs a multi-tenant Digital Experience Platform on Google Cloud — a PaaS that hosts and operates web projects for our clients. The platform team builds and runs the infrastructure that everything else depends on. We're hiring an Infrastructure Engineer to join that team.
What you'll be doing
- Own infrastructure-as-code. Extend and refactor a sizeable Ansible codebase that provisions our fleet end-to-end, from GCP network and instance provisioning down to per-service configuration on each host.
- Keep the fleet patched and hardened. OS, packages, and dependencies get patched on a rhythm you help define. You run security audits and apply hardening across the fleet, and you push back when a quick fix is going to bite us in six months.
- Operate production. Investigate incidents, tune the web/database/cache stack for performance, roll out fixes. You're first responder for infra alerts during office hours. Out-of-hours, weekends, and nights are covered by our 24/7 partner — we don't expect you to work odd hours.
- Run incidents properly. Lead the response, write the post-mortem, and follow through on the underlying cause so we don't see the same ticket twice. Keep the runbooks current so the 24/7 partner can act on them without needing to wake you up.
- Ship across layers. Platform features cross infrastructure, application config, and the Ansible playbooks that tie them together. You're comfortable working across those boundaries.
- Improve observability. Build dashboards, queries, and alerts on our Grafana / Loki / Prometheus stack so we hear about problems before our clients do. Track uptime, resource usage, and the recurring noise that usually points to a real underlying problem.
- Make the platform safer and faster. Traffic shielding, WAF rules, capacity tuning, network hardening — you have opinions and the autonomy to act on them.
- Be there for go-lives. Hands-on during client launches, not just on standby.
We're looking for
- 4+ years operating Linux in production (Debian/Ubuntu family). You've debugged a real outage at 2am and you know what to grep for. Comfortable across the web stack layers: reverse proxy, application server, database, cache.
- Strong infrastructure-as-code discipline, Ansible preferred. Not "I've written a playbook" but "I've designed role hierarchies others can build on, and I know when not to reach for automation."
- Networking fundamentals you can reason from first principles: VPCs, subnets, CIDR, DNS, TLS, reverse proxies.
- A working sense of production security — patch latency, attack surface, what a misconfigured WAF rule actually does when traffic hits it.
- An observability mindset. You reach for metrics and logs before SSH.
- Plain language communication. We document what we learn and expect engineers to update docs when the system changes.
Nice to have
- GCP experience or certification (Associate Cloud Engineer / Professional Cloud Architect). Our platform runs on GCP, and fluent operators are force multipliers.
- Working knowledge of PHP/Drupal or Python — our application tier is Drupal/PHP with a Python worker layer, so being able to read and patch either is a real plus.
- Experience operating PHP / Drupal hosting at scale (PHP-FPM tuning, Varnish, MySQL ops).
- Message queue operational experience (RabbitMQ or comparable AMQP broker).
- Secrets management (Hashicorp Vault or equivalent).
- A real opinion about CI/CD pipeline design.
How we work
- Small, senior, autonomous team. We commit often, deploy often, document what we learn, and we expect engineers to investigate root causes rather than work around symptoms. We push back on bad patterns and welcome having ours pushed back on. AI-assisted workflows are a plus, never a substitute for engineering judgment.
- We don't do nightly pager duty. Out-of-hours coverage is handled by a 24/7 partner, so what you take on is what fits inside a normal working week.
Where you can work from
Fully remote within the European Economic Area. We can also hire from countries outside the EEA that have a current Adequacy Decision from the European Commission — if you're unsure whether yours qualifies, ask and we'll check.


