
5001 - 10000 employees
Millions of developers around the world have used Twilio to unlock the magic of communications to improve any human experience.Twilio has democratized communications channels like voice, text, chat, video, and email by virtualizing the worldâs communications infrastructure through APIs that are simple enough for any developer to use, yet robust enough to power the worldâs most demanding applications.By making communications a part of every software developerâs toolkit, Twilio is enabling innovators across every industry â from emerging leaders to the worldâs largest organizations â to reinvent how companies engage with their customers.Founded in 2008, Twilio has over 5,000 employees in 26 offices in 17 countries and counting, with headquarters in San Francisco and other offices in Atlanta, Bangalore, Berlin, BogotĂĄ, Denver, Dublin, Paris, Prague, Hong Kong, Irvine, London, Madrid, Munich, MalmĂś, Mountain View, Redwood City, New York City, SĂŁo Paulo, Sydney, Melbourne, Singapore, Tallinn, and Tokyo.
đ February 26
đ California, Colorado, +8 more states â Remote
đľ $227.8k - $335k / year
â° Full Time
đ Senior
đ´ Lead
â DevOps & Site Reliability Engineer (SRE)
đŚ H1B Visa Sponsor
Improve your chances of getting an interview by checking your resume score before you apply.

5001 - 10000 employees
Millions of developers around the world have used Twilio to unlock the magic of communications to improve any human experience.Twilio has democratized communications channels like voice, text, chat, video, and email by virtualizing the worldâs communications infrastructure through APIs that are simple enough for any developer to use, yet robust enough to power the worldâs most demanding applications.By making communications a part of every software developerâs toolkit, Twilio is enabling innovators across every industry â from emerging leaders to the worldâs largest organizations â to reinvent how companies engage with their customers.Founded in 2008, Twilio has over 5,000 employees in 26 offices in 17 countries and counting, with headquarters in San Francisco and other offices in Atlanta, Bangalore, Berlin, BogotĂĄ, Denver, Dublin, Paris, Prague, Hong Kong, Irvine, London, Madrid, Munich, MalmĂś, Mountain View, Redwood City, New York City, SĂŁo Paulo, Sydney, Melbourne, Singapore, Tallinn, and Tokyo.
⢠Partner with senior technical leaders across Twilio to set and communicate the reliability strategy, translating business goals into measurable outcomes. ⢠Influence company-wide architectural decisions while balancing long-term vision with near-term and compliance needs. ⢠Lead the design, implementation, and operation of scalable solutions and paved roads that enable reliable, high-traffic services; ⢠Influence company-wide architectural decisions to focus on availability, performance, resilience, and cost efficiency using Kubernetes, AWS, Terraform, and modern observability. ⢠Ensure integrity and quality across the service lifecycle; design fault-tolerant architectures, incident response, disaster recovery, and capacity/cost management. ⢠Collaborate with product and cross-functional teams to identify reliability risks and convert them into actionable designs, programs, and tooling. ⢠Establish and champion reliability practices and drive systemic improvements. ⢠Mentor and grow engineers and technical leaders ⢠Track and apply emerging SRE, cloud, and large-scale systems best practices; introduce pragmatic innovations that improve reliability at scale.
⢠15+ years of experience in Reliability Engineering, Software Engineering, DevOps roles with a focus on infrastructure, backend systems, and reliability, including as a principal/architect. ⢠Strong experience in driving strategic technical decisions and defining long-term technical vision. ⢠In-depth understanding of the role of Reliability Engineering in a large and diverse SaaS organization. ⢠Experience driving cross-org technical architecture outcomes. ⢠Knowledge of cloud architecture, devops practices, and large-scale systems design with microservices. ⢠Bachelor's or Master's degree in Computer Science, Engineering, or a related field (or equivalent experience). ⢠Strong production experience, including operational management, scaling, partitioning strategies, and tuning for performance and reliability in high-scale environments. ⢠Hands-on experience with Kubernetes (e.g., EKS), deploying and managing stateful services, and cloud services like AWS. ⢠Proficiency in infrastructure-as-code tools such as Terraform or CloudFormation for automating infrastructure. ⢠Expertise in observability tools (e.g., Prometheus, Grafana, Datadog) for monitoring distributed systems and setting up alerting. ⢠Proficient in at least one programming language (e.g., Go, Python, Java) for building automation and tooling. ⢠Experience designing incident response processes, SLOs/SLIs, runbooks, and participating in on-call rotations. ⢠Experience running cross-functional post-incident reviews and driving improvements. ⢠Strong understanding of distributed systems principles, including consensus, durability, throughput, and availability tradeoffs. ⢠Proven track record of leading reliability improvements in data-intensive or mission-critical systems and collaborating with engineering teams. ⢠Excellent problem-solving, analytical, verbal, and written communication skills, with the ability to work in cross-functional and distributed environments. ⢠Demonstrated leadership in mentoring teams, influencing decisions, and balancing long-term objectives with short-term needs. ⢠Ability to influence and build effective working relationships with all levels of the organization.
⢠health care insurance ⢠401(k) retirement account ⢠paid sick time ⢠paid personal time off ⢠paid parental leave
Apply Nowđ February 26
Site Reliability Engineer at EngFlow ensuring highly available cloud infrastructure for build acceleration. Collaborating closely with engineering teams to automate systems and manage reliability.
đşđ¸ United States â Remote
đ° $18M Series A - EngFlow on 2022-11
â° Full Time
đĄ Mid-level
đ Senior
â DevOps & Site Reliability Engineer (SRE)
AWS
Cloud
Distributed Systems
Google Cloud Platform
Kubernetes
Terraform
đ February 26
Devops Security Engineer at Knox securing cloud-native environments for U.S. government missions. Focus on preventative security, automation, and continuous compliance within FedRAMP frameworks.
đşđ¸ United States â Remote
đľ $110k - $140k / year
đĽ Funding within the last year
đ° $6.5M Seed on 2025-08
â° Full Time
đĄ Mid-level
đ Senior
â DevOps & Site Reliability Engineer (SRE)
AWS
Azure
Cloud
Google Cloud Platform
Kubernetes
Terraform
đ February 26
Senior Professional Services DevOps Engineer designing CI/CD pipelines at JFrog. Collaborating with clients and teams to enhance DevOps experience.
đşđ¸ United States â Remote
đľ $160k - $175k / year
â° Full Time
đ Senior
â DevOps & Site Reliability Engineer (SRE)
đŚ H1B Visa Sponsor
Ansible
AWS
Azure
Chef
Cloud
Docker
Google Cloud Platform
Java
Jenkins
Kubernetes
Linux
Maven
Open Source
Puppet
đ February 26
Senior DevOps engineer driving evolution of Risk Labs operations and development processes. Work closely with platform engineers on internal tooling and vital protocol operations.
đşđ¸ United States â Remote
đľ $100k - $200k / year
â° Full Time
đ Senior
â DevOps & Site Reliability Engineer (SRE)
Cloud
Google Cloud Platform
Python
Terraform
Web3
đ February 25
Backend/DevOps Engineer managing deployments and infrastructure for AI trading platform. Responsible for security, reliability, and scaling of systems across multiple venues.
đşđ¸ United States â Remote
â° Full Time
đĄ Mid-level
đ Senior
â DevOps & Site Reliability Engineer (SRE)
AWS
Cloud
Docker
Google Cloud Platform
Grafana
Kubernetes
Prometheus
Python
Web3