Staff Site Reliability Engineer

🕒 March 9

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Dave

Dave

201 - 500 employees

Fighting for the underdog – We started Dave for one reason: banks weren’t built for people like us, and we knew we deserved better.Like David slaying Goliath, we set out to take on banks and their predatory ways. Our first fight? Making overdraft fees a thing of the past by spotting members the money they needed, without charging them $38. Why? Because it’s the right thing to do.Since then, we’ve continued to bring our members the products traditional banks won't: $500 advances, fee-free goal tracking, and simple ways to find Side Hustles when you’re behind on your budget. We’ve grown a lot since we started, but one thing has never changed: We’re building products that level the financial playing field.

📋 Description

• Lead architecture and automation across our GCP environment, ensuring reliability, scalability, security, and thoughtful cost management. • Define and improve SLIs, SLOs, and error budgets using Cloud Monitoring and Datadog — connecting reliability goals to real business outcomes. • Shape our multi-region, disaster recovery, and capacity planning strategies so the platform holds up as we grow. • Design and optimize cloud networking, including VPC architecture, ingress/egress, Cloud Armor, VPN, and DNS to support internal systems, partner integrations, and member-facing services. • Drive infrastructure-as-code and GitOps practices using Terraform, Kubernetes, Helm, and ArgoCD to make deployments predictable and repeatable. • Mentor SREs and infrastructure engineers through design reviews, incident retros, and hands-on collaboration — strengthening technical depth across the team. • Explore practical LLM-driven automation where it meaningfully reduces operational toil and shortens incident resolution time.

🎯 Requirements

• 8+ years in software, infrastructure, or site reliability engineering. • 5+ years of hands-on experience operating production systems in GCP (compute, networking, storage, IAM, observability). • Deep experience with Kubernetes (GKE), Helm, containerization, Terraform (IaC), and ArgoCD. • Strong programming skills in Python, Go, or TypeScript/JavaScript for automation and internal tooling. • Experience defining and operating against SLIs, SLOs, and error budgets. • Strong knowledge of relational and distributed databases (e.g., MySQL, Cloud SQL, Cloud Spanner, Redis), including performance tuning and HA strategies. • Experience leading incident response, root cause analysis, and systemic remediation.

🏖️ Benefits

• Opportunity to tackle tough challenges, learn and grow from fellow top talent, and help millions of people reach their personal financial goals • Flexible hours and virtual first work culture with a home office stipend • Premium Medical, Dental, and Vision Insurance plans • Generous paid parental and caregiver leave • 401(k) savings plan with matching contributions • Financial advisor and financial wellness support • Flexible PTO and generous company holidays, including Juneteenth and Winter Break • All-company in-person events once or twice a year and virtual events throughout to connect with your team members and leadership team

Apply Now

Similar Jobs

🕒 March 7

Inetum

10,000+ employees

🤝 B2B

🏢 Enterprise

☁️ SaaS

Expert DevOps / DevSecOps supporting Generative AI initiatives at Inetum for digital transformation in the United States. Designing high-value GenAI use cases and integrating new tools and practices.

🗣️🇫🇷 French Required

Cloud

Open Source

🕒 March 3

Kapitus

201 - 500

💸 Finance

💳 Fintech

🤝 B2B

Cloud DevSecOps Engineer III enhancing security for Kapitus through AWS solutions. Responsibilities include monitoring, programming, testing, and collaboration with developers.

AWS

Azure

Cloud

Distributed Systems

DynamoDB

🕒 February 27

Fuze Health

1001 - 5000

☁️ SaaS

🤝 B2B

💊 Pharmaceuticals

Staff DevSecOps Engineer shaping security architecture in complex healthcare systems. Joining Fuze Health's Engineering organization to enhance security posture across platforms.

AWS

Cloud

Google Cloud Platform

Jenkins

Kubernetes

Python

Ruby

Terraform

Go

🕒 February 26

Twilio

5001 - 10000

Reliability Architect at Twilio defining and leading solutions for reliable products. Collaborating with teams to ensure operational excellence and scalability in high-scale systems design.

AWS

Cloud

Distributed Systems

Grafana

Java

Kubernetes

Microservices

Prometheus

Python

Terraform

Go

🕒 February 25

DroneUp

51 - 200

🚀 Aerospace

☁️ SaaS

🤝 B2B

SRE - Platform Engineer at DroneUp focusing on IT infrastructure reliability and scalability. Driving SRE best practices within the team and collaborating on cloud engineering solutions.

AWS

Azure

Cloud

Google Cloud Platform

Grafana

Kubernetes

Linux

MacOS

Node.js

Prometheus

Python

Terraform

Unix

Go