Senior Site Reliability Engineer – Build

Job not on LinkedIn

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Remote

Remote

501 - 1000 employees

👥 HR Tech

☁️ SaaS

🏢 Enterprise

HR Tech • SaaS • Enterprise

Remote is a global HR platform that simplifies the process of hiring, onboarding, managing, and paying employees and contractors worldwide. It offers comprehensive solutions for recruitment, payroll management, contractor management, and compliance. The platform supports businesses in handling HR tasks seamlessly and efficiently, ensuring fast and compliant payouts, providing employer of record services, and facilitating employee benefits and equity offerings. Additionally, Remote integrates with various HR systems, allowing for a flexible, scalable, and reliable solution for businesses looking to expand globally.

📋 Description

• Infrastructure as code at scale. Design, implement, and maintain infrastructure-as-code patterns using Terraform and Kubernetes that support both standard connectors and custom builds. Make it easy for engineers to deploy and operate with confidence. • Observability and incident response. Build and maintain comprehensive monitoring, logging, and alerting systems. Lead incident response efforts, conduct post-mortems, and drive continuous improvement in system reliability. • Security and compliance in motion. Work with our Security team to embed security into every layer of Build infrastructure. Ensure we meet compliance requirements across 100+ jurisdictions without creating friction for developers or customers. • Performance and cost optimisation. Continuously optimize system performance, resource utilization, and cloud costs. Make recommendations that improve both reliability and unit economics. • Automation and operational leverage. Identify manual operational toil and systematically eliminate it. Build tools and processes that let teams operate efficiently without scaling headcount. • Platform reliability and developer experience. Partner with platform teams to ensure APIs, MCP, and CLI are resilient and observable. Give infrastructure feedback that shapes how the platform evolves.

🎯 Requirements

• Senior-level SRE experience: demonstrated experience in a Site Reliability Engineering, DevOps Engineering, or SysOps role. You have stood up and operated production systems at scale. • Kubernetes and AWS: deep, hands-on experience running Kubernetes in production. Solid AWS fundamentals across compute, networking, storage, and managed services. • Infrastructure-as-code: Proficiency with Terraform or similar IaC tools. You write code to define infrastructure; you don't click buttons in the console. • CI/CD and deployment automation: real experience setting up and operating GitLab, GitHub Actions, Jenkins, or similar. You understand deployment strategies, rollback mechanisms, and safety nets. • Scripting and systems knowledge: strong bash scripting. Comfortable debugging system-level issues, reading logs, and understanding Linux kernel basics. • Great communication: you explain complex infrastructure decisions clearly to both engineers and non-technical stakeholders. You write clear runbooks and documentation. • Nice to have: Experience with 1+ backend programming language (Elixir, Python, Go, Java, Node.js, etc.). • Nice to have: Experience in consultancy settings. • Nice to have: Container registry and artifact management (ECR, Docker Hub, etc.). • Nice to have: Observability stack depth (Datadog, Prometheus, ELK, Grafana, or similar). • Nice to have: Experience working with or scaling multi-tenant platforms.

🏖️ Benefits

• work from anywhere • flexible paid time off • flexible working hours (we are async) • 16 weeks paid parental leave • mental health support services • stock options • learning budget • home office budget & IT equipment • budget for local in-person social events or co-working spaces

Apply Now

Similar Jobs

🕒 May 21

Alpaca

201 - 500

🔌 API

💳 Fintech

₿ Crypto

Site Reliability Engineer at Alpaca ensuring reliability of brokerage platform through cloud infrastructure. Focused on PostgreSQL reliability and observability in a global engineering team.

Cloud

DNS

Kubernetes

Linux

Postgres

Python

Go

🕒 May 13

IONITY

51 - 200

Reliability Engineer analyzing and resolving quality issues in EV charging networks at IONITY. Collaborating with stakeholders to ensure system performance and support software releases.

🕒 April 27

Tabby

201 - 500

💳 Fintech

🛍️ eCommerce

Senior ServiceDesk Reliability Engineer at fintech company Tabby, responsible for SRE and Terraform tasks, supporting a global engineering team.

BigQuery

Cloud

Google Cloud Platform

Kubernetes

Python

Terraform

Go

🕒 April 22

Tabby

201 - 500

💳 Fintech

🛍️ eCommerce

Senior ServiceDesk Reliability Engineer at Tabby, a fintech unicorn. Collaborating within an international engineering team to enhance service reliability.

BigQuery

Cloud

Google Cloud Platform

Python

Terraform

Go

🕒 April 21

Bonapolia

11 - 50

🎯 Recruiter

🤝 B2B

DevOps specializing in Ruby on Rails for a fully remote taxi ordering service. Design and operate scalable systems, ensuring high availability and performance with a collaborative team.

AWS

Distributed Systems

DNS

Docker

DynamoDB

EC2

Grafana

Kafka

Kubernetes

Linux

Microservices

MongoDB

MySQL

NoSQL

Postgres

Prometheus

Pulsar

RabbitMQ

Redis

Ruby

TCP/IP

Terraform