Senior Site Reliability Engineer

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Remote

Remote

501 - 1000 employees

👥 HR Tech

☁️ SaaS

🏢 Enterprise

HR Tech • SaaS • Enterprise

Remote is a global HR platform that simplifies the process of hiring, onboarding, managing, and paying employees and contractors worldwide. It offers comprehensive solutions for recruitment, payroll management, contractor management, and compliance. The platform supports businesses in handling HR tasks seamlessly and efficiently, ensuring fast and compliant payouts, providing employer of record services, and facilitating employee benefits and equity offerings. Additionally, Remote integrates with various HR systems, allowing for a flexible, scalable, and reliable solution for businesses looking to expand globally.

📋 Description

• As a Senior SRE at Remote, you'll work with a high degree of autonomy on complex reliability and platform problems, owning the plan and execution of features and projects within our SRE/Platform domain. • You'll contribute to the platform's architecture and reliability strategy, translating ambiguous requirements into robust, maintainable solutions and raise the technical bar of the engineers around you while collaborating closely with product and security teams in an async-first, fully remote environment. • You'll work AI-natively day to day and build reusable AI workflows that make the whole team faster and more reliable, not just yourself. • Lead solution discovery and delivery for reliability and infrastructure problems with real ambiguity, complexity, or scope. Autonomously, coordinating with other contributors where needed. • Contribute to the platform's architecture, tooling, and roadmap. Influence team priorities and advocate for technical initiatives. • Help define and operate reliability practices for our platform: SLOs/SLIs, error budgets, alerting, observability. Take responsibility for the team's operational stance, using support/incident metrics to shape technical strategy. • Resolve cross-team requests, identify systemic issues, and turn recurring ones into reusable fixes and runbooks rather than one-off answers. • Work AI-natively and operationalise it for the team: use agentic workflows by default; build reusable prompts, skills, and tooling embedded in the codebase so others ship faster, safely; design agent-ready systems (clean interfaces, good observability) that make AI-assisted changes easy to review. Establish shared standards and domain-level guardrails (secure-by-default patterns, CI protections, AI-assisted review practices). • Mentor and give timely, actionable feedback to less-senior engineers; participate in hiring, onboarding, and RFC discussions. • Collaborate with Security on platform hardening and threat mitigation; contribute to capacity and cost-efficiency of the infrastructure. • Participate in incident response and on-call rotations to rapidly resolve issues and maintain system reliability.

🎯 Requirements

• Solid professional experience in SRE, DevOps, or Platform Engineering. • Solid hands-on Kubernetes: operating and scaling production clusters and container tooling (Docker) and its ecosystem. • Experience building and managing cloud infrastructure on AWS (or similar). • Strong infrastructure-as-code practice with Terraform. • Experience with reliability frameworks: SLOs, SLIs, error budgets, alerting strategies. • Solid observability background: OpenTelemetry, Grafana/Prometheus or similar. • Proficiency with CI/CD (GitLab CI, GitHub Actions, or similar) and deployment automation. • Comfortable with Golang, Bash/scripting; broader programming a plus. • Practical, embedded use of AI in infra/ops/dev work, agentic workflows with concrete, observable results, not just familiarity with the tools. • Clear and thoughtful communication, especially in an async-first, global setting • Proactive, curious, and comfortable taking ownership of challenges • Collaborative and respectful across cultures, time zones, and backgrounds

🏖️ Benefits

• work from anywhere • flexible paid time off • flexible working hours (we are async) • 16 weeks paid parental leave • mental health support services • stock options • learning budget • home office budget & IT equipment • budget for local in-person social events or co-working spaces

Apply Now

Similar Jobs

🕒 June 19

Traffic Label Limited

11 - 50

🤝 B2B

📱 Media

Mid-Level DevOps Engineer at Traffic Label supporting and improving cloud infrastructure and CI/CD pipelines. Collaborating with development teams to ensure operational excellence and automation.

Ansible

AWS

Cloud

Docker

Google Cloud Platform

Kafka

Kubernetes

Postgres

Prometheus

Python

Terraform

🕒 May 21

Alpaca

201 - 500

🔌 API

💳 Fintech

₿ Crypto

Site Reliability Engineer at Alpaca ensuring reliability of brokerage platform through cloud infrastructure. Focused on PostgreSQL reliability and observability in a global engineering team.

Cloud

DNS

Kubernetes

Linux

Postgres

Python

Go

🕒 May 13

IONITY

51 - 200

Reliability Engineer analyzing and resolving quality issues in EV charging networks at IONITY. Collaborating with stakeholders to ensure system performance and support software releases.

🕒 April 27

Tabby

201 - 500

💳 Fintech

🛍️ eCommerce

Senior ServiceDesk Reliability Engineer at fintech company Tabby, responsible for SRE and Terraform tasks, supporting a global engineering team.

BigQuery

Cloud

Google Cloud Platform

Kubernetes

Python

Terraform

Go

🕒 April 22

Tabby

201 - 500

💳 Fintech

🛍️ eCommerce

Senior ServiceDesk Reliability Engineer at Tabby, a fintech unicorn. Collaborating within an international engineering team to enhance service reliability.

BigQuery

Cloud

Google Cloud Platform

Python

Terraform

Go