
11 - 50 employees
Founded 2015
đď¸ eCommerce
đ¤ Artificial Intelligence
đ¤ B2B
eCommerce ⢠Artificial Intelligence ⢠B2B
InfiniteChoice is a platform company that builds and scales startups and high-growth consumer businesses by combining capital, operational expertise, and intellectual property. The firm emphasizes automation-first execution and an AI-led ecosystem to accelerate time-to-scale for businesses with clear product-market fit, focusing on launching and optimizing eCommerce brands and customer platforms. Backed by private equity and led by experienced operators, InfiniteChoice deploys strategic capital, technology, and operational talent to drive profitable, high-margin growth across its portfolio.
đ February 24
đ California, Texas â Remote
đľ $180k - $210k / year
â° Full Time
đ´ Lead
â DevOps & Site Reliability Engineer (SRE)
Improve your chances of getting an interview by checking your resume score before you apply.

11 - 50 employees
Founded 2015
đď¸ eCommerce
đ¤ Artificial Intelligence
đ¤ B2B
eCommerce ⢠Artificial Intelligence ⢠B2B
InfiniteChoice is a platform company that builds and scales startups and high-growth consumer businesses by combining capital, operational expertise, and intellectual property. The firm emphasizes automation-first execution and an AI-led ecosystem to accelerate time-to-scale for businesses with clear product-market fit, focusing on launching and optimizing eCommerce brands and customer platforms. Backed by private equity and led by experienced operators, InfiniteChoice deploys strategic capital, technology, and operational talent to drive profitable, high-margin growth across its portfolio.
⢠Build SRE practices from scratch - define SLIs, SLOs, error budgets, and reliability metrics ⢠Establish incident response procedures, on-call rotations, and post-mortem processes ⢠Create reliability engineering standards and best practices across all engineering teams ⢠Develop disaster recovery and business continuity strategies ⢠Design and implement capacity planning and performance optimization frameworks ⢠Drive architecture decisions for comprehensive application and infrastructure monitoring solutions ⢠Design and develop custom SRE tools for automated monitoring, alerting, and remediation ⢠Build observability platforms that provide deep insights into system performance and user experience ⢠Create automation frameworks for deployment, scaling, and incident response ⢠Architect logging, metrics, and tracing systems for distributed microservices environments ⢠Leverage Google Cloud Platform services to build resilient, scalable infrastructure ⢠Implement cloud-native monitoring using Stackdriver, Cloud Monitoring, and Cloud Logging ⢠Design auto-scaling and self-healing systems using GKE, Cloud Functions, and managed services
⢠12+ years of experience in Site Reliability Engineering or Infrastructure Engineering ⢠5+ years in lead SRE roles building and scaling SRE teams and processes ⢠Proven track record designing and implementing monitoring and observability solutions at scale ⢠Deep understanding of distributed systems, microservices architectures, and cloud-native patterns ⢠Experience with infrastructure as code, configuration management, and deployment automation ⢠Hands-on experience with Google Cloud Platform is required ⢠Expertise with GCP monitoring and observability stack (Cloud Monitoring, Cloud Logging, Cloud Trace) ⢠Experience with GKE, Compute Engine, Cloud Functions, and other core GCP services ⢠Bachelor's degree in Computer Science, Engineering, or equivalent professional experience ⢠Industry certifications (Google Cloud Professional, SRE or related certifications preferred)
⢠Ground-floor opportunity to build SRE practices and culture from scratch ⢠Full autonomy to define processes, select technologies, and establish best practices ⢠Direct impact on platform reliability serving millions of users ⢠Opportunity to create lasting engineering culture and operational excellence ⢠Remote-first culture with in-person meeting in Dallas, TX on need basis ⢠Collaborative environment with smart, passionate engineers and cross-functional teams ⢠Access to cutting-edge technologies and AI-driven development tools ⢠Competitive compensation, equity participation, and comprehensive benefits
Apply Nowđ February 19
Director of Site Reliability Engineering at Affirm owning execution for reliability and operational excellence. Leading a diverse global team and bridging collaboration across multiple departments.
đşđ¸ United States â Remote
đľ $267k - $360k / year
đ° Post-IPO Equity on 2021-01
â° Full Time
đ´ Lead
â DevOps & Site Reliability Engineer (SRE)
đŚ H1B Visa Sponsor
đ February 12
1001 - 5000
Principal Software Engineer on the SRE team at Upstart, advocating for reliability and scalability. Leading cross-functional collaboration and shaping technical roadmaps for SRE initiatives.
đşđ¸ United States â Remote
đľ $195.3k - $270.4k / year
â° Full Time
đ´ Lead
â DevOps & Site Reliability Engineer (SRE)
đŚ H1B Visa Sponsor
JavaScript
Prometheus
Python
Terraform
TypeScript
Go
đ January 27
Senior DevSecOps Engineer improving cybersecurity posture and supporting compliance for federal requirements in the U.S. Working remotely with less than 10% travel.
đşđ¸ United States â Remote
â° Full Time
đ Senior
đ´ Lead
â DevOps & Site Reliability Engineer (SRE)
Ansible
AWS
Azure
Cloud
Docker
Google Cloud Platform
Kubernetes
OpenShift
Python
Terraform
đ January 9
Staff Site Reliability Engineer designing and operating a hybrid cloud environment at PathAI. Focused on implementing SRE best practices and enhancing infrastructure reliability.
đşđ¸ United States â Remote
đľ $165.8k - $224.4k / year
đ° $165M Series C on 2021-05
â° Full Time
đ´ Lead
â DevOps & Site Reliability Engineer (SRE)
đŚ H1B Visa Sponsor
Ansible
AWS
Cloud
Grafana
Prometheus
Python
Terraform
đ December 24, 2025
SRE / DevOps Manager at Upshop leading reliability and operations engineering team. Responsible for scalability, security, and performance of infrastructure.
đşđ¸ United States â Remote
â° Full Time
đ Senior
đ´ Lead
â DevOps & Site Reliability Engineer (SRE)
AWS
Azure
Cloud
Docker
Google Cloud Platform
Grafana
Kubernetes
MongoDB
Prometheus
Python
Shell Scripting
Terraform
Go