
201 - 500 employees
Founded 2013
🤖 Artificial Intelligence
🏢 Enterprise
☁️ SaaS
💰 Series F on 2022-06
Artificial Intelligence • Enterprise • SaaS
Domino Data Lab is a company that empowers AI-driven enterprises to build and manage AI at scale through its Enterprise AI Platform. The platform provides an integrated experience for model development, MLOps, collaboration, and governance, enabling global enterprises to innovate across various sectors. Domino supports better medicinal development, productive agriculture, and competitive product creation. Established in 2013 and backed by notable investors like Sequoia Capital and NVIDIA, Domino enables companies to optimize AI deployment effectively.
🔥 0 minutes ago
🏄 California – Remote
💵 $200k - $230k / year
⏰ Full Time
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor
Improve your chances of getting an interview by checking your resume score before you apply.

201 - 500 employees
Founded 2013
🤖 Artificial Intelligence
🏢 Enterprise
☁️ SaaS
💰 Series F on 2022-06
Artificial Intelligence • Enterprise • SaaS
Domino Data Lab is a company that empowers AI-driven enterprises to build and manage AI at scale through its Enterprise AI Platform. The platform provides an integrated experience for model development, MLOps, collaboration, and governance, enabling global enterprises to innovate across various sectors. Domino supports better medicinal development, productive agriculture, and competitive product creation. Established in 2013 and backed by notable investors like Sequoia Capital and NVIDIA, Domino enables companies to optimize AI deployment effectively.
• Lead the development of Domino's internal AI-assisted reliability tooling, including systems that analyze tickets, logs, traces, and documentation to help teams resolve outages faster with less recurring toil • Improve the observability coverage and signal quality for our most critical customer-facing systems, so engineers have more to work with throughout the development and support lifecycle • Own incident response end-to-end, from detection to remediation, and leave each problem space better documented, better understood, and less likely to recur • Guide the development of customer and user-facing observability tools within our products • Define and mature SLO/SLI frameworks for priority services, turning abstract reliability goals into measurable, actionable standards • Scale cloud operations practices for Domino’s single-tenant SaaS offering, and work with engineering teams to improve the reliability and repeatability of customer deployments and upgrades • Mentor other engineers and shape how SRE is practiced at Domino, including incident response workflows, operational readiness expectations, and post-incident learning culture
• Deep experience in Site Reliability Engineering, platform engineering, or a software engineering role with genuine, hands-on operational ownership • Fluency with Kubernetes, Linux, cloud platforms, and observability tooling, and the ability to use them to investigate complex, real-world production problems • A strong ability to perceive and close reliability gaps in technical products, tools and processes • Strong software engineering skills in Python or Go, with a track record of building internal tools or services that people actually rely on • Comfort leading technically ambiguous work and influencing direction across teams without needing direct authority to get things done • A history of improving reliability through engineering and automation, not just putting out fires manually • Strong communication skills and real experience mentoring engineers or shaping technical decision-making on your team • Sound judgment about AI/LLM tooling: you know where it genuinely helps in operational workflows and where it adds noise instead of signal • Bonus: Experience with LLM-based systems, retrieval workflows, SaaS platform operations, or building tooling for support or developer teams
• equity • company bonus or sales commissions/bonuses • 401(k) plan • medical, dental, and vision benefits • wellness stipends
Apply Now🔥 14 hours ago
DevSecOps Software Developer SME designing and maintaining automation and integration capabilities for cloud and software delivery environments. Enhance software delivery and reduce manual work for mission-focused solutions.
🇺🇸 United States – Remote
💵 $149.5k - $201.3k / year
⏰ Full Time
🟠 Senior
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor
🔥 16 hours ago
Sr. Security Engineer leading integration of security across the software development lifecycle at TrueML. Engaging in security automation, cloud security, and innovative AI solutions.
🇺🇸 United States – Remote
💵 $122.1k - $160k / month
⏰ Full Time
🟠 Senior
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🔥 23 hours ago
DevOps Engineering Manager at Vannevar, focusing on platform reliability and team leadership in a defense tech environment. Leading secure, scalable infrastructure and tooling processes.
🇺🇸 United States – Remote
💰 $12M Series A on 2021-08
⏰ Full Time
🟠 Senior
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🕒 2 days ago
DevSecOps AWS Engineer focused on securing and managing applications hosted on AWS. Collaborating with clients to design, implement, maintain, and test cloud technical solutions.
🇺🇸 United States – Remote
💵 $98.5k - $206.8k / year
⏰ Full Time
🟠 Senior
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🕒 3 days ago
DevSecOps Engineer delivering tailored solutions for clients at GDIT. Providing architectural guidance and leading a dedicated DevSecOps team.
🇺🇸 United States – Remote
💵 $170.1k - $207k / year
⏰ Full Time
🟠 Senior
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor