Site Reliability Engineer
Overview:
Two Barrels is looking for a Site Reliability Engineer who can help keep our systems steady, secure, and running like a well-oiled machine (except without actual oil). You’ll work closely with our DevOps engineers to build out tools and automation that make things faster, easier, and less painful for everyone.
Your main job? Stop problems before they start. And when something does break (because let’s be real—it will), help us fix it quickly and learn from it so we don’t do the same dumb thing twice. We’re big on taking ownership here. You won’t get blamed for something going wrong—but you will be expected to help make it right.
If you like digging into weird errors, thinking ahead, and making things just work—even when no one notices—this might be your kind of thing.
Location:
Remote | Spokane, WA | Salt Lake City, UT | Austin, TX
Duration:
Full Time
Wage:
up to $175,000/ Year
Minimum Qualifications:
- Bachelor's degree in Computer Science, Software Engineering, or equivalent practical experience.
- 5+ years of experience in software engineering.
- 2+ years of experience in site reliability engineering, DevOps, or infrastructure engineering roles.
- Deep experience with cloud platforms (AWS, Azure, or GCP) and infrastructure as code tools such as Terraform, CloudFormation, or Pulumi.
- Strong proficiency with Kubernetes, Docker, and container orchestration in production environments.
- Hands-on experience with observability and monitoring tools like Prometheus, Grafana, OpenTelemetry, Sentry, or New Relic.
- Proven ability to design and implement highly available, fault-tolerant systems and lead proactive incident response efforts.
- Experience with performance tuning, database optimization, and caching strategies (e.g., PostgreSQL, Redis, Memcached).
- Demonstrated ability to drive reliability improvements, reduce operational toil, and foster a culture of resilience and continuous improvement.
Preferred Qualifications:
- Experience leading reliability-focused initiatives such as post-incident reviews, capacity planning, and root cause analysis.
- Experience in site reliability engineering within Ruby on Rails environments.
- Familiarity with the Grafana observability stack and related tools (e.g., Alloy, Loki, Tempo, Prometheus).
- In-depth experience with AWS services, including ECS, EKS, Route 53, and other related tools.
- Proven ability to collaborate across teams to improve service reliability, reduce incident frequency, and drive operational excellence.
- Troubleshoot and resolve complex production issues, applying SRE best practices to minimize impact and prevent recurrence.
- Continuously drive improvements in operational efficiency and system resilience.
Why you might like this job:
You like when things work—and you’re the kind of person who quietly fixes things while everyone else is still yelling “It’s broken!” You think alerts should be useful, not just annoying background noise, and you enjoy building systems that mostly run themselves (because babysitting servers isn’t your idea of fun).
You probably have a bit of a tinkerer’s soul. Maybe you’ve automated your coffee maker or built a Raspberry Pi just to turn your lights purple. You appreciate clean logs, quiet dashboards, and sleep that isn’t interrupted by 3AM calls.
You want to work somewhere that’s weird in a good way—where you’re trusted to do your job, encouraged to ask “why?”, and no one makes you sit through a meeting about synergy.
If that all sounds oddly satisfying, this might be the job for you.
Benefits:
- Great Wage & Success Meetings with your manager
- Work From Home comfort package & company provided equipment
- 22 days paid time off annually, PLUS 4 paid holidays
- Up to 5% 401k employer matching through Fidelity
- 100% employer-paid medical, dental and vision for employees
- Maternity and Paternity Leave
- Flexible hours
- Coffee shop next door
Crappy parking?Oh, I mean a cool downtown location for easy public transportation options…