Site Reliability Engineer

Job not on LinkedIn

🔥 9 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Supabase

Supabase

51 - 200 employees

Founded 2020

☁️ SaaS

🔌 API

🤖 Artificial Intelligence

💰 $80M Series B on 2022-05

SaaS • API • Artificial Intelligence

Supabase is an open source alternative to Firebase, providing a range of backend tools designed to help developers start and scale their applications effectively. It offers features such as a full Postgres database, authentication with Row Level Security, instant APIs, Edge Functions for custom code, real-time data synchronization, and storage for large files. Developers can integrate machine learning models, utilize RESTful APIs, and take advantage of platform-integrated best of breed products. Supabase is designed to be highly portable, extendable, and user-friendly, making it a powerful choice for startups and enterprises looking to innovate quickly and efficiently.

📋 Description

• Partner with service teams to define meaningful SLIs and SLOs grounded in customer experience, and build the error budget policies that turn them into engineering decisions • Own and evolve the Operational Readiness Review (ORR) process — conducting reviews for new services and major changes across observability, alerting, runbooks, capacity, and graceful degradation • Strengthen the incident-to-improvement pipeline: connecting postmortem findings to operational readiness gaps, identifying repeat failure patterns, and driving systemic fixes • Act as the reliability expert teams pull in for architecture reviews, failure mode analysis, dependency mapping, and resilience design • Identify and quantify operational toil across the org, and build or advocate for automation that eliminates it • Help teams design sustainable on-call practices: alert quality, escalation paths, runbook coverage, and noise reduction • Track and report on org-wide operational maturity, surfacing systemic gaps and driving remediation

🎯 Requirements

• Have 7+ years of experience in SRE, production engineering, or reliability-focused roles, including experience shaping SRE practices and driving adoption across engineering teams • Have a software engineering mindset — you write code and build tools, not just configure them • Have hands-on experience defining and operationalizing SLOs/SLIs at scale, including error budget policies that actually influenced engineering decisions • Have deep experience with incident response, postmortem facilitation, and turning incident learnings into systemic improvements • Have worked with large-scale multi-tenant systems (bonus: managed database platforms or Postgres) • Are proficient with cloud infrastructure (AWS preferred) and infrastructure-as-code (Pulumi preferred, Terraform/CDK also acceptable) • Communicate clearly and persuasively — this role requires influencing without authority across a distributed org • Have experience in async or globally distributed teams • Are energized by making other teams more effective rather than being the one who fixes everything

🏖️ Benefits

• Fully Remote • ESOP • Tech Allowance • Health Benefits • Annual Off-Sites • Flexible Work • Professional Development

Apply Now

Similar Jobs

🕒 May 28

Chess.com

501 - 1000

🎮 Gaming

📚 Education

📱 Media

Site Reliability Engineer at Chess.com ensuring infrastructure stability and scalable systems for millions of users. Playing a critical role in supporting rapid feature development and deployment.

🌏 Anywhere in the World

💰 Private Equity Round on 2022-01

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🕒 May 13

Shuru

51 - 200

🤖 Artificial Intelligence

🤝 B2B

🏢 Enterprise

Senior DevOps Engineer helping scale cloud platform from pre-production to production for fintech. Collaborating with teams to enhance infrastructure, deployment, and monitoring processes.

🕒 April 22

Social Discovery Group

1001 - 5000

🌍 Social Impact

📱 Media

Senior DevOps Engineer developing and scaling IaC and CI/CD systems for social discovery products. Collaborating with global teams and driving automation with a focus on security and observability.

🕒 April 1

Canonical

501 - 1000

Senior Site Reliability Engineer with Python infra-as-code for Cloud operations at Canonical. Enabling devsecops for applications on OpenStack and Kubernetes in a remote global environment.

🕒 April 1

Canonical

501 - 1000

Senior Site Reliability / Gitops Engineer building automation solutions for Canonical's IT services. Collaborating globally to enhance operational efficiency and infrastructure management.