Director of SRE

Job not on LinkedIn

🕒 April 30

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Intus Care

Intus Care

11 - 50 employees

⚕️ Healthcare Insurance

☁️ SaaS

🤖 Artificial Intelligence

💰 $13.1M Venture Round on 2023-01

Healthcare Insurance • SaaS • Artificial Intelligence

Intus Care is a healthcare analytics platform that synthesizes healthcare data to identify risks, visualize trends, and optimize care. The company empowers long-term care providers to deliver more effective care to older adults by predicting high-risk patients, reducing expenditures through early risk detection, and improving organizational performance using data-driven insights. Intus Care is particularly beneficial for PACE (Programs of All-Inclusive Care for the Elderly) organizations, providing tools that enable care providers and executives to make informed decisions and proactively manage patient care based on real-time analytics.

📋 Description

• Own and execute the SRE strategy and multi-quarter roadmap across reliability, observability, incident management, QA maturity, and release engineering. • Define, measure, and continuously improve SLAs, SLOs, error budgets, uptime, performance, and operational health metrics across all products and services. • Lead production reliability for the full platform, including monitoring, alerting, on-call operations, incident response, root cause analysis, and MTTR reduction. • Establish release readiness standards, deployment safety controls, and quality gates to ensure stable and predictable product releases. • Manage external SRE vendors and partners, including service delivery, SLA governance, escalations, performance reviews, and compliance expectations. • Lead QA engineering strategy with a focus on automation, regression prevention, test coverage, and reducing escaped defects in production. • Partner with Security and Engineering leaders to ensure cloud infrastructure, CI/CD pipelines, and operational tooling meet HIPAA, SOC2, and internal security standards. • Oversee core platform operations including Azure AKS environments, Kubernetes, GitOps workflows, CI/CD pipelines, GitHub Actions, secrets management, access controls, and audit readiness. • Drive observability maturity using tools such as Grafana, Prometheus, logging platforms, tracing tools, and automated alerting frameworks. • Collaborate with Product, Platform, and Engineering teams to embed reliability and quality best practices throughout the software development lifecycle. • Build, mentor, and scale high-performing SRE and QA teams while fostering a culture of ownership, accountability, learning, and continuous improvement. • Drive adoption of AI-enabled automation and intelligent tooling to reduce manual toil, improve productivity, and strengthen operational excellence.

🎯 Requirements

• 12+ years of SRE, infrastructure, or platform engineering experience, with 5+ years of engineering leadership roles. • Proven track record owning site reliability for complex, multi-tenant SaaS platforms with demanding availability requirements. • Demonstrated experience defining SLA and SLO frameworks, error budgets, and incident management processes at scale. • Experience managing vendor relationships for managed infrastructure or SRE services, including SLA governance and performance management. • Track record leading QA or quality engineering functions, including test automation maturity and release gate ownership. • Strong communication and cross-functional influence skills — able to represent reliability to both technical and non-technical audiences.

🏖️ Benefits

• Own and build the SRE function for a modern healthcare EMR platform serving PACE populations — from the ground up. • Lead a blended team model combining managed services, internal QA, and internal SRE in a high-growth engineering organization. • Work on systems where reliability directly impacts clinical care delivery for vulnerable patient populations. • Shape engineering culture in a company that actively embraces AI-assisted software development with Claude Code. • Fully remote, collaborative engineering environment with direct access to executive leadership.

Apply Now

Similar Jobs

🕒 April 21

Mistral AI

11 - 50

Join Mistral AI as a Site Reliability Engineer focusing on optimization and reliability. Collaborate with teams to enhance platform performance and ensure system availability.

Cloud

Distributed Systems

Docker

Flux

Grafana

Kubernetes

Prometheus

Python

Terraform

Go

🕒 April 20

Ad Hoc LLC

501 - 1000

🏛️ Government

🤖 Artificial Intelligence

🔌 API

Staff Software Engineer - Full Stack supporting the VA's digital services with a focus on software reliability and performance. Join a team committed to transforming the experience of Veterans.

Angular

JavaScript

React

Ruby

Ruby on Rails

Svelte

Vue.js

🕒 April 14

Creyos (formerly Cambridge Brain Sciences)

51 - 200

⚕️ Healthcare Insurance

☁️ SaaS

🔬 Science

DevOps Engineer focusing on enhancing the efficiency and reliability of software deployment processes at Creyos. Work on automating configuration management and implementing CI/CD pipelines.

AWS

Cloud

Python

Ruby

Ruby on Rails

Terraform

🕒 April 13

A Place for Mom

501 - 1000

🏪 Marketplace

👥 B2C

Staff DevOps Engineer focusing on Site Reliability Engineering and security practices for A Place for Mom. Responsible for enhancing the developer platform and ensuring robust security measures.

AWS

Firewalls

Linux

TCP/IP

Terraform

🕒 April 9

Las Vegas Sands Corp.

10,000+ employees

🎮 Gaming

Executive Director overseeing global DevSecOps functions, including infrastructure and application security for Sands. Leading teams to ensure compliance and optimize solutions while supporting IT initiatives.

AWS

Azure

Cloud

Cyber Security

Docker

JavaScript

Kubernetes

Python

Ruby

SDLC

Go