Staff Site Reliability Engineer

201 - 500 employees

Fighting for the underdog – We started Dave for one reason: banks weren’t built for people like us, and we knew we deserved better.Like David slaying Goliath, we set out to take on banks and their predatory ways. Our first fight? Making overdraft fees a thing of the past by spotting members the money they needed, without charging them $38. Why? Because it’s the right thing to do.Since then, we’ve continued to bring our members the products traditional banks won't: $500 advances, fee-free goal tracking, and simple ways to find Side Hustles when you’re behind on your budget. We’ve grown a lot since we started, but one thing has never changed: We’re building products that level the financial playing field.

Staff Site Reliability Engineer

🕒 March 9

🇺🇸 United States – Remote

💵 $208k - $330k / year

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Cloud

DNS

Google Cloud Platform

JavaScript

Kubernetes

MySQL

Python

Redis

SQL

Terraform

TypeScript

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Dave

201 - 500 employees

📋 Description

• Lead architecture and automation across our GCP environment, ensuring reliability, scalability, security, and thoughtful cost management. • Define and improve SLIs, SLOs, and error budgets using Cloud Monitoring and Datadog — connecting reliability goals to real business outcomes. • Shape our multi-region, disaster recovery, and capacity planning strategies so the platform holds up as we grow. • Design and optimize cloud networking, including VPC architecture, ingress/egress, Cloud Armor, VPN, and DNS to support internal systems, partner integrations, and member-facing services. • Drive infrastructure-as-code and GitOps practices using Terraform, Kubernetes, Helm, and ArgoCD to make deployments predictable and repeatable. • Mentor SREs and infrastructure engineers through design reviews, incident retros, and hands-on collaboration — strengthening technical depth across the team. • Explore practical LLM-driven automation where it meaningfully reduces operational toil and shortens incident resolution time.

🎯 Requirements

• 8+ years in software, infrastructure, or site reliability engineering. • 5+ years of hands-on experience operating production systems in GCP (compute, networking, storage, IAM, observability). • Deep experience with Kubernetes (GKE), Helm, containerization, Terraform (IaC), and ArgoCD. • Strong programming skills in Python, Go, or TypeScript/JavaScript for automation and internal tooling. • Experience defining and operating against SLIs, SLOs, and error budgets. • Strong knowledge of relational and distributed databases (e.g., MySQL, Cloud SQL, Cloud Spanner, Redis), including performance tuning and HA strategies. • Experience leading incident response, root cause analysis, and systemic remediation.

🏖️ Benefits

• Opportunity to tackle tough challenges, learn and grow from fellow top talent, and help millions of people reach their personal financial goals • Flexible hours and virtual first work culture with a home office stipend • Premium Medical, Dental, and Vision Insurance plans • Generous paid parental and caregiver leave • 401(k) savings plan with matching contributions • Financial advisor and financial wellness support • Flexible PTO and generous company holidays, including Juneteenth and Winter Break • All-company in-person events once or twice a year and virtual events throughout to connect with your team members and leadership team

Apply Now

Similar Jobs

Expert DevOps, DevSecOps, GenAI

🕒 March 7

Inetum

10,000+ employees

🤝 B2B

🏢 Enterprise

☁️ SaaS

Expert DevOps / DevSecOps supporting Generative AI initiatives at Inetum for digital transformation in the United States. Designing high-value GenAI use cases and integrating new tools and practices.

🇺🇸 United States – Remote

💰 Post-IPO Equity on 2007-03

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🗣️🇫🇷 French Required

Cloud

Open Source

DevSecOps Engineer III

🕒 March 3

Kapitus

201 - 500

💸 Finance

💳 Fintech

🤝 B2B

Cloud DevSecOps Engineer III enhancing security for Kapitus through AWS solutions. Responsibilities include monitoring, programming, testing, and collaboration with developers.

🇺🇸 United States – Remote

💵 $117.8k - $189k / year

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Azure

Cloud

Distributed Systems

DynamoDB

Staff DevSecOps Engineer

🕒 February 27

Fuze Health

1001 - 5000

☁️ SaaS

🤝 B2B

💊 Pharmaceuticals

Staff DevSecOps Engineer shaping security architecture in complex healthcare systems. Joining Fuze Health's Engineering organization to enhance security posture across platforms.

🇺🇸 United States – Remote

💵 $166k - $200k / year

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

Google Cloud Platform

Jenkins

Kubernetes

Python

Ruby

Terraform

Software Architect, Reliability Engineering

🕒 February 26

Twilio

5001 - 10000

Reliability Architect at Twilio defining and leading solutions for reliable products. Collaborating with teams to ensure operational excellence and scalability in high-scale systems design.

🇺🇸 United States – Remote

💵 $227.8k - $335k / year

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

AWS

Cloud

Distributed Systems

Grafana

Java

Kubernetes

Microservices

Prometheus

Python

Terraform

SRE – Platform Engineer

🕒 February 25

DroneUp

51 - 200

🚀 Aerospace

☁️ SaaS

🤝 B2B