
Artificial Intelligence • Cloud • Data Analytics
Pythian is a global data and analytics services company that specializes in helping organizations transform by leveraging data, analytics, AI, and the cloud. Pythian offers services in database management, cloud solutions, digital workplaces, and enterprise applications, working with partners like AWS, Google, and Microsoft. The company enables clients to optimize their data estates, secure their data, and drive better business outcomes through advanced analytics and artificial intelligence. Pythian serves a variety of industries, including financial, healthcare, manufacturing, retail, and education, providing tailored solutions that enhance operational efficiency, security, and innovation.
August 26
AWS
Cloud
Distributed Systems
Docker
Grafana
Kubernetes
Linux
Microservices
Oracle
Prometheus
Python
Shell Scripting
Terraform
Go

Artificial Intelligence • Cloud • Data Analytics
Pythian is a global data and analytics services company that specializes in helping organizations transform by leveraging data, analytics, AI, and the cloud. Pythian offers services in database management, cloud solutions, digital workplaces, and enterprise applications, working with partners like AWS, Google, and Microsoft. The company enables clients to optimize their data estates, secure their data, and drive better business outcomes through advanced analytics and artificial intelligence. Pythian serves a variety of industries, including financial, healthcare, manufacturing, retail, and education, providing tailored solutions that enhance operational efficiency, security, and innovation.
• Pythian: strategic database and analytics services, driving digital transformation and operational excellence • Lead and mentor a team of Site Reliability Engineers to ensure technical excellence and professional growth • Oversee queue management, ticket prioritization, workload distribution to meet SLAs and utilization targets • Act as the primary point of contact for critical escalations and severity-1 incidents • Design, deploy, and operate large-scale distributed systems across compute, storage, networking, and AI/ML environments • Lead projects from architecture through automation to intelligent monitoring • Collaborate with clients and internal teams to build resilient, high-performing infrastructure
• A minimum of 3 years previous experience leading a team • Experience with Google Cloud and IaC tools (Terraform) • Strong knowledge of microservices, containers (Kubernetes, Docker), and networking • Hands-on experience with PKI, service mesh (Istio), and Linux systems administration • SRE mindset focused on automation, scalability, and reliability • Operate and optimize Kubernetes clusters, Istio service mesh, and Linux-based systems • Automate workflows using Go, Python, and Shell scripting • Build monitoring and observability solutions with Prometheus, Grafana, and Loki • Troubleshoot complex networking, storage, and system performance issues • Partner with AI/ML teams to ensure infrastructure readiness for model training and data pipelines
• Competitive total rewards package • Blog during work hours • Substantial training allowance and professional development days • Flexible remote work — work from home with no daily travel requirement • Home office equipment provided (laptop with choice of OS and annual personalization budget) • Annual wellness budget (gym memberships, massages, fitness and more) • Generous paid vacation and sick days • Paid day off to volunteer for a charity
Apply NowAugust 25
Senior SRE building AI-driven observability and self-healing systems for Virta Health. Focus on reliability, automation, and developer tooling.
🇺🇸 United States – Remote
💵 $167.2k - $216k / year
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor
Python
Terraform
Go
August 22
Designs, builds and maintains large-scale Observability and Telemetry platforms at NVIDIA. Drives reliability, automation and incident response.
🇺🇸 United States – Remote
💵 $168k - $333.5k / year
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor
Cloud
Distributed Systems
Docker
Grafana
Kubernetes
Linux
Open Source
OpenStack
Perl
Prometheus
Python
Ruby
Go
August 20
Salesforce DevOps Architect providing leadership for multiple Salesforce teams. Managing CI/CD pipelines and enforcing development standards in a remote role.
Cloud
August 20
Senior SRE building scalable, secure infra for AI compute at TensorWave. Designs low-level systems and automates infrastructure.
Cloud
JavaScript
Kubernetes
Linux
Rust
Spring
Terraform
Go
August 20
Deployment Engineer at Atolio: ensure secure, scalable deployments of enterprise search across environments; build automation and collaborate with success teams.
AWS
Azure
Cloud
Distributed Systems
Google Cloud Platform
Grafana
Kubernetes
Python
ServiceNow
Splunk
Terraform
Go