
SaaS • B2B • Marketplace
ServiceTitan is a comprehensive software platform designed for the trades industry, providing solutions to enhance productivity and profitability for businesses. It offers a variety of features including dispatching, scheduling, marketing, reporting, and customer experience tools, tailored for trades like plumbing, HVAC, electrical services, and more. ServiceTitan seeks to empower businesses by optimizing operations, improving cash flow, and delivering superior customer experiences through an all-in-one platform. The software includes real-time data analytics, financing options, and mobile capabilities to support the operational needs of contractors and increase their revenue streams. By consolidating multiple business functions into a single platform, ServiceTitan aims to help contractors grow profitably and efficiently.
6 hours ago
🇺🇸 United States – Remote
💵 $183.4k - $245.4k / year
⏰ Full Time
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor

SaaS • B2B • Marketplace
ServiceTitan is a comprehensive software platform designed for the trades industry, providing solutions to enhance productivity and profitability for businesses. It offers a variety of features including dispatching, scheduling, marketing, reporting, and customer experience tools, tailored for trades like plumbing, HVAC, electrical services, and more. ServiceTitan seeks to empower businesses by optimizing operations, improving cash flow, and delivering superior customer experiences through an all-in-one platform. The software includes real-time data analytics, financing options, and mobile capabilities to support the operational needs of contractors and increase their revenue streams. By consolidating multiple business functions into a single platform, ServiceTitan aims to help contractors grow profitably and efficiently.
• Lead the design, implementation, and optimization of scalable, resilient infrastructure for cloud-native AI services on Azure. • Establish true continuous delivery (CD) pipelines supporting blue-green deployments, automatic rollbacks, and progressive delivery patterns. • Champion observability excellence - define best practices for metrics, tracing, and logging; help product team design meaningful SLIs, SLOs, and error budgets. • Drive automation across the entire lifecycle: infrastructure provisioning, testing, deployment, and recovery. • Partner with the engineering team to design reliable, fault-tolerant services and perform resilience and capacity reviews. • Establish best practices for observability that not only monitor service health but also track the end-to-end success/failure of complex, automated agent workflows and their business impact (SLIs/SLOs). • Leverage Infrastructure as Code (IaC) using Terraform, Kubernetes, and Docker to standardize environments and reduce manual intervention. • Contribute to and maintain CI/CD pipelines using GitHub Actions, Azure DevOps, or TeamCity. • Implement and improve service health dashboards with Mimir, Grafana, Prometheus, or ELK stack to ensure system visibility and reliability. • Mentor engineers and foster a reliability culture across teams — enabling others to build self-healing, observable systems.
• Bachelor’s or Master’s degree in Computer Science, Information Technology, or related field. • Solid experience in SRE, DevOps, or infrastructure engineering, with strong hands-on expertise in Azure. • Proven experience designing and operating distributed systems at scale with a strong understanding of reliability engineering principles (SLIs/SLOs/SLA). • Deep proficiency with Terraform, Kubernetes, Docker, and modern IaC and container orchestration best practices. • Expertise in CI/CD automation and release engineering - capable of implementing blue-green, canary, and rollback mechanisms. • Knowledge of SQL Server and PostgreSQL performance tuning and management in cloud environments is a plus. • Advanced use of observability tools such as Mimir, Grafana, Prometheus, and ELK stack. • Experience promoting GitOps workflows and tools such as Argo CD or Flux. • Excellent troubleshooting, systems thinking, and mentoring skills.
• Flextime, recognition, and support for autonomous work: Flexible time off with ample learning and development opportunities to continue growing your career. • Comprehensive onboarding program, leadership training for Titans at all levels, and other programs and events. • Great work is rewarded through Bonusly, peer-nominated awards, and more. • Company-paid medical, dental, and vision (with 100% employer paid options and 90% coverage for dependents), FSA and HSA, 401k match, and telehealth options including memberships to One Medical. • Parental leave and support, up to $20k in fertility services (i.e. IUI and IVF), surrogacy, and adoption reimbursement, on demand maternity support through Maven Maternity, free breast milk shipping through Maven Milk, pet insurance, legal advisory services, financial planning tools, and more.
Apply Now19 hours ago
Director of Site Reliability Engineering leading a globally-distributed team for Akamai's cloud network. Ensuring reliability, performance, and operational excellence in a fast-paced environment.
🇺🇸 United States – Remote
💵 $183.6k - $381.4k / year
💰 Post-IPO Equity on 2001-07
⏰ Full Time
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor
Ansible
Cloud
Distributed Systems
Grafana
Linux
Prometheus
SaltStack
19 hours ago
DevOps Engineer at Alaska Northstar Federal joining a long-term project. Collaborating with stakeholders to advance user-centric design and accessibility best practices in cloud environments.
🇺🇸 United States – Remote
💵 $140k - $170k / year
⏰ Full Time
🟠 Senior
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
Cloud
Oracle
SDLC
Terraform
19 hours ago
DevOps Engineer supporting U.S. government cloud services with compliance and infrastructure coding. Collaborating within Agile teams to enhance security and system functionality.
🇺🇸 United States – Remote
💵 $140k - $170k / year
⏰ Full Time
🟠 Senior
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
Cloud
Oracle
SDLC
Terraform
Yesterday
Staff Site Reliability Engineer working with developers to ensure infrastructure reliability and performance. Collaborating with engineering teams on cloud infrastructure and deployment pipelines for a clean tech company.
🇺🇸 United States – Remote
💵 $173.5k - $204k / year
⏰ Full Time
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor
AWS
Cloud
DNS
Docker
Google Cloud Platform
JavaScript
Kubernetes
Linux
Python
TypeScript
Yarn
2 days ago
Senior Cloud DevOps Administrator leading the design, automation, and optimization of enterprise-scale Azure operations for Leidos. Managing cloud resources while mentoring junior staff and supporting AWS resources as needed.
🇺🇸 United States – Remote
💵 $89.7k - $162.2k / year
⏰ Full Time
🟠 Senior
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor
Ansible
Azure
Cloud
Linux
Python
Terraform
Go