Senior Site Reliability Engineer

11 - 50 employees

Founded 2021

🏢 Enterprise

☁️ SaaS

Enterprise • SaaS • Cloud

Akuity is a company that provides an end-to-end GitOps platform for Kubernetes, focusing on deployment, promotion, and monitoring with tools like Argo CD, Kargo, and KubeVision. The platform extends Kubernetes APIs to enhance continuous delivery, container orchestration, and event automation. Akuity's solutions aim to simplify infrastructure management, improve security, and increase deployment efficiency, demonstrating significant time savings and improved shipping velocity for developers. Akuity also offers enterprise support for Argo, emphasizing security, compliance, and scalability for Kubernetes deployments.

Senior Site Reliability Engineer

🔥 0 minutes ago

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

EC2

Grafana

Kubernetes

Prometheus

Python

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Akuity

11 - 50 employees

Founded 2021

🏢 Enterprise

☁️ SaaS

Enterprise • SaaS • Cloud

📋 Description

• Own SLI/SLO/SLA definitions for the Akuity SaaS platform and drive continuous improvement against them • Design, instrument, and maintain observability systems (metrics, logs, traces) across multi-region AWS infrastructure • Identify reliability gaps, lead blameless post-mortems, and close the loop with permanent fixes • Partner with engineering teams to build reliability into new features before they ship to production • Participate in an on-call rotation and act as incident commander for high-severity production events • Build and maintain runbooks, escalation paths, and incident playbooks that keep mean time to resolution low • Drive improvements to alerting fidelity; reduce noise, increase signal, eliminate toil • Lead post-incident reviews with clear timelines, root cause analysis, and follow-through on action items

🎯 Requirements

• 5+ years of SRE, platform engineering, or production operations experience in a SaaS environment • Deep hands-on Kubernetes expertise; you understand the scheduler, networking, storage, and autoscaling at a level where you can debug anything • Strong AWS fundamentals across compute (EC2, EKS), networking (VPC, NLB, Route53), storage (S3, RDS), and IAM • Experience defining and operating against SLOs in production; you've written error budgets, not just read about them • Proficiency with observability tooling (Prometheus, Grafana, OpenTelemetry, Datadog, or equivalent) • Solid scripting and automation skills; Go, Python, Bash, or similar; you automate what you touch • Strong written communication: clear runbooks, sharp incident reports, thoughtful post-mortems • Live within US time zones (Pacific through Eastern), including Canada and other regions

🏖️ Benefits

• Competitive compensation, commensurate with experience • Equity participation in a well-funded, growing company • Fully remote: work from anywhere within US time zones (Pacific through Eastern), including Canada and other regions • Home office stipend and equipment budget • Flexible time off and a culture that respects it • Work directly with the engineers who built Argo CD and Kargo; you'll learn a lot here • US-based employees receive full benefits, including comprehensive health, dental, and vision coverage. Candidates based outside the US will be engaged as contractors.

Apply Now

Similar Jobs

DevOps Engineer – Remote US

🔥 3 hours ago

Your Software Supplier

51 - 200

🏪 Marketplace

🤝 B2B

☁️ SaaS

DevOps Engineer managing cloud infrastructure at Your Software Supplier. Leveraging Azure, DevOps, Kubernetes, and Docker for seamless deployment pipelines.

🇺🇸 United States – Remote

💵 $160k - $220k / year

💰 $50k Pre Seed Round - Your Software Supplier on 2019-08

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Azure

Cloud

Docker

Kubernetes

DevOps Engineer

🔥 5 hours ago

Akkadian Labs

51 - 200

☁️ SaaS

🏢 Enterprise

📡 Telecommunications

DevOps Engineer supporting scalable and secure infrastructure and DevOps processes at Akkadian Labs. Collaborating with development and product teams for reliable deployments and automation.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

Docker

EC2

Grafana

Jenkins

Kubernetes

Linux

Prometheus

Python

Terraform

Senior Site Reliability Engineer

🔥 8 hours ago

Sanity.io

51 - 200

☁️ SaaS

🛍️ eCommerce

📱 Media

SRE managing scalable content operations infrastructure for AI-powered platform. Collaborating with dev teams and ensuring reliability for high request volume systems.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Cloud

Distributed Systems

Google Cloud Platform

Kubernetes

Prometheus

DevSecOps Engineer

🔥 19 hours ago

Global Alliant Inc

51 - 200

🤖 Artificial Intelligence

🏢 Enterprise

🏛️ Government

Senior Full Stack Software Engineer in Agile teams supporting federal technology initiatives. Responsible for building secure, scalable, cloud-native applications using modern tech stacks.

🇺🇸 United States – Remote

💵 $90k - $135k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Angular

AWS

Cloud

Docker

Java

JavaScript

Jenkins

Kubernetes

React

Selenium

Spring

Spring Boot

SpringBoot

TypeScript

Senior DevSecOps Engineer

🔥 22 hours ago

Cayuse

201 - 500

☁️ SaaS

📋 Compliance

⚡ Productivity