Team Lead, Site Reliability Engineering

Artificial Intelligence • Cloud • Data Analytics

Pythian is a global data and analytics services company that specializes in helping organizations transform by leveraging data, analytics, AI, and the cloud. Pythian offers services in database management, cloud solutions, digital workplaces, and enterprise applications, working with partners like AWS, Google, and Microsoft. The company enables clients to optimize their data estates, secure their data, and drive better business outcomes through advanced analytics and artificial intelligence. Pythian serves a variety of industries, including financial, healthcare, manufacturing, retail, and education, providing tailored solutions that enhance operational efficiency, security, and innovation.

201 - 500 employees

Founded 1997

🤖 Artificial Intelligence

💰 $15M Venture Round on 2017-05

Team Lead, Site Reliability Engineering

Job not on LinkedIn

August 26

🏄 California – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

Distributed Systems

Docker

Grafana

Kubernetes

Linux

Microservices

Oracle

Prometheus

Python

Shell Scripting

Terraform

Apply Now

Pythian

Artificial Intelligence • Cloud • Data Analytics

201 - 500 employees

Founded 1997

🤖 Artificial Intelligence

💰 $15M Venture Round on 2017-05

📋 Description

• Pythian: strategic database and analytics services, driving digital transformation and operational excellence • Lead and mentor a team of Site Reliability Engineers to ensure technical excellence and professional growth • Oversee queue management, ticket prioritization, workload distribution to meet SLAs and utilization targets • Act as the primary point of contact for critical escalations and severity-1 incidents • Design, deploy, and operate large-scale distributed systems across compute, storage, networking, and AI/ML environments • Lead projects from architecture through automation to intelligent monitoring • Collaborate with clients and internal teams to build resilient, high-performing infrastructure

🎯 Requirements

• A minimum of 3 years previous experience leading a team • Experience with Google Cloud and IaC tools (Terraform) • Strong knowledge of microservices, containers (Kubernetes, Docker), and networking • Hands-on experience with PKI, service mesh (Istio), and Linux systems administration • SRE mindset focused on automation, scalability, and reliability • Operate and optimize Kubernetes clusters, Istio service mesh, and Linux-based systems • Automate workflows using Go, Python, and Shell scripting • Build monitoring and observability solutions with Prometheus, Grafana, and Loki • Troubleshoot complex networking, storage, and system performance issues • Partner with AI/ML teams to ensure infrastructure readiness for model training and data pipelines

🏖️ Benefits

• Competitive total rewards package • Blog during work hours • Substantial training allowance and professional development days • Flexible remote work — work from home with no daily travel requirement • Home office equipment provided (laptop with choice of OS and annual personalization budget) • Annual wellness budget (gym memberships, massages, fitness and more) • Generous paid vacation and sick days • Paid day off to volunteer for a charity

Apply Now

Similar Jobs

Senior Site Reliability Engineer

August 25

Virta Health

201 - 500

⚕️ Healthcare Insurance

🧘 Wellness

Senior SRE building AI-driven observability and self-healing systems for Virta Health. Focus on reliability, automation, and developer tooling.

🇺🇸 United States – Remote

💵 $167.2k - $216k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Python

Terraform

Senior Site Reliability Engineer, Observability and Telemetry Platform

August 22

NVIDIA

10,000+ employees

🤖 Artificial Intelligence

🎮 Gaming

Designs, builds and maintains large-scale Observability and Telemetry platforms at NVIDIA. Drives reliability, automation and incident response.

🇺🇸 United States – Remote

💵 $168k - $333.5k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Cloud

Distributed Systems

Docker

Grafana

Kubernetes

Linux

Open Source

OpenStack

Perl

Prometheus

Python

Ruby

Salesforce DevOps Architect, ML Operations

August 20

Gov Services Hub

51 - 200

🏛️ Government

🔒 Cybersecurity

🎯 Recruiter

Salesforce DevOps Architect providing leadership for multiple Salesforce teams. Managing CI/CD pipelines and enforcing development standards in a remote role.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Cloud

Site Reliability Engineer

August 20

TensorWave

11 - 50

🤖 Artificial Intelligence

🏢 Enterprise

☁️ SaaS

Senior SRE building scalable, secure infra for AI compute at TensorWave. Designs low-level systems and automates infrastructure.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Cloud

JavaScript

Kubernetes

Linux

Rust

Spring

Terraform

Senior Deployment Engineer

August 20

Atolio

11 - 50

🤖 Artificial Intelligence

🏢 Enterprise

☁️ SaaS

Deployment Engineer at Atolio: ensure secure, scalable deployments of enterprise search across environments; build automation and collaborate with success teams.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Azure

Cloud

Distributed Systems

Google Cloud Platform

Grafana

Kubernetes

Python

ServiceNow

Splunk

Terraform