Quick Summary
Staff Software Engineer - Full Stack / Site Reliability Engineer (SRE) - Remote
Ad Hoc is a technology company dedicated to delivering scalable, impactful digital services and transforming government technology using modern, agile methods. This is a remote position, built for flexibility and collaboration, enabling us to hire top talent nationwide.
Work on things that matter: Join us in shaping defining moments in public-sector service delivery, including products connecting Veterans to tailored services, helping millions access affordable health care, and supporting critical programs like Head Start.
Ad Hoc values acceptance, accountability, and humility. We build small, inclusive teams to collaborate closely with partners, solve the right problems, and deliver functional software.
The Veterans Affairs business unit focuses on transforming the VA into a modern digital services organization, centered on Veteran outcomes. We partner with the VA to design and deliver seamless user experiences for Veterans, their families, caregivers, and VA employees.
The Watchtower Team (SRE Focus)
The Watchtower Team is a specialized SRE group supporting va.gov. We function as both consultants and hands-on problem solvers, working across multiple teams to enhance the reliability, performance, and overall health of the va.gov platform. This high-impact role directly contributes to improving the experience of Veterans accessing critical services.
We seek a highly skilled and adaptable engineer to diagnose and resolve complex issues in a large web application, utilizing expertise in observability, incident response, and software development.
Primary Responsibilities:
As an emerging subject matter expert and individual contributor, you will operate with a high level of trust and autonomy, requiring minimal oversight. You will lead and monitor delivery requirements (scope, schedule) and may support adjacent programs within the business unit.
- Plans and executes roadmaps for new projects without explicit guidance from technical supervisors.
- Actively participates in conversations and planning sessions with partners and key stakeholders.
- Periodically travels to work with and present to clients, partners, and stakeholders.
- Elaborates on and evolves complex and ambiguous products to uncover constraints and new opportunities.
- Reduces ambiguity in systems through documentation, refactoring, and automated testing.
- Effectively communicates on existing systems, design decisions, past performance, and project history for bid-writing, tech demos, and client-facing communications.
- Participates in technical depth interviews with new candidates.
- Presents on technical topics effectively, articulating implementation complexity and costs to inform business decisions.
- Serves as the primary lead and proactively communicates with stakeholders.
- Utilizes strong influential skills to drive improvements in software engineering processes and practices.
Key Responsibilities:
- Troubleshoot and Resolve Production Issues: Diagnose and fix performance bottlenecks and errors within the va.gov application (primarily Ruby on Rails monolith, including Sidekiq background jobs). Familiarity with similar frameworks is valuable.
- Observability & Monitoring: Utilize DataDog (and potentially Dynatrace) to monitor application performance, identify anomalies, and proactively address potential problems. Develop and maintain relevant dashboards and alerts.
- Incident Response and On-Call Rotation ("The Watch"): Participate in the on-call rotation approximately once per month. This involves reviewing the previous day's alerts and ensuring no silent failures occurred. Expect to work 2-4 hours each day on the weekend during your on-call week to maintain system reliability.
- Code Contributions: Write and review code to improve observability and fix bugs (Ruby on Rails), implement improvements, and maintain internal tools (JavaScript/SvelteKit, and Python).
- Consulting & Collaboration: Work closely with other engineering teams, providing guidance on best practices for observability, reliability, and performance. Communicate technical issues clearly to both technical and non-technical audiences.
- Process Improvement: Identify and implement improvements to monitoring, alerting, and incident response processes. Contribute to documentation and runbooks.
- Maintain Internal Tools: Contribute to the development and maintenance of a small SvelteKit application used for tracking team metrics and success.
Basic Qualifications:
- Bachelor's Degree and 9+ years of relevant experience.
- 5+ years of experience as a Software Engineer or Site Reliability Engineer.
- 3+ years of experience with backend web application development in a production environment.
- Strong preference for Ruby on Rails experience, but candidates with demonstrable experience in other dynamic languages (e.g., Python/Django/Flask, Node.js/Express, PHP/Laravel) or compiled languages with web frameworks (e.g., Java/Spring, C#/.NET) will be considered.
- Experience with Sidekiq or other background job processing framework (e.g., Celery).
- Proven experience with Application Performance Monitoring (APM) tools, specifically DataDog and/or Dynatrace. Ability to interpret metrics and identify root causes.
- Demonstrated experience in incident response and troubleshooting complex production issues.
- Experience with at least one modern JavaScript framework (React, Angular, Vue, Svelte, etc.).
- Excellent communication, collaboration, and consulting skills.
- Ability to work effectively in a fast-paced, dynamic environment and within an Agile environment.
Preferred Qualifications:
- Experience with vets-api.
- Prior experience working within the VA/OCTO environment or any large government software deployment that integrates with multiple legacy services.
- Experience with Python for scripting, API interactions, and ETL/data engineering tasks.
- General understanding of DevOps concepts (containerization, virtualization, networking).
- Familiarity with GitHub Actions.
- Experience with the U.S. Web Design System (USWDS).
Benefits:
- Company-subsidized health, dental, and vision insurance.
- Flexible PTO.
- 401K with employer match.
- Paid parental leave after one year of service.
- Employee Assistance Program.
The starting pay range for this role is $140,000 - $150,000.
To learn more about working at Ad Hoc, please visit: https://adhocteam.us/join

