Engineering Leader, AI & Machine Learning Operations, AIOps

Job not on LinkedIn

August 12

Apply Now
Logo of CloudBees

CloudBees

Software • B2B • DevOps

CloudBees is a leading provider of DevOps solutions, specializing in optimizing and managing the software development lifecycle through its advanced Continuous Integration and Continuous Delivery (CI/CD) capabilities. By leveraging Jenkins, CloudBees enables developers to streamline workflows, automate processes, and enhance software delivery across hybrid and multi-cloud environments. CloudBees empowers organizations to optimize developer experiences through features like security and compliance management, feature management, and smart testing, ensuring seamless integration of tools within the DevOps ecosystem.

501 - 1000 employees

Founded 2010

🤝 B2B

💰 $95M Debt Financing on 2021-12

📋 Description

• Lead and scale a team responsible for AIOps, including model deployment, monitoring, and lifecycle management. • Architect and implement AI/ML pipelines that are scalable, observable, and reproducible. • Collaborate with cross-functional teams (data science, DevOps, product) to integrate AI/ML systems into our SaaS platform. • Establish best practices for AI/ML experimentation, CI/CD for models, data versioning, and model governance. • Own the full stack of AIOps infrastructure, from data ingestion to real-time inference systems. • Drive technical vision and roadmap for ML platform development. • Act as a mentor and coach, helping engineers grow in a fast-paced, startup environment. • Manage a team of 5+ Ability to launch new platforms 0 - 1 and drive adoption internally and externally with partner teams.

🎯 Requirements

• 7+ years of engineering experience, including platform engineering, system development, or related roles with at least 3 years in leadership roles. • 3 years of experience with large-scale systems, with a focus on reliability, scalability, and maintainability; and 1 year of experience with AI/ML systems • Strong hands-on experience with MLOps tools (e.g., MLflow, Kubeflow, SageMaker, Airflow, Metaflow). • Proven track record building ML pipelines in production environments. • Experience with cloud infrastructure (AWS, GCP, or Azure) and container orchestration (Kubernetes). • Deep knowledge of CI/CD practices as they relate to ML lifecycle. • Prior experience in a startup or fast-paced SaaS environment. • Strong collaboration and communication skills. • Experience deploying and managing services such as Amazon bedrock or Vertex AI - LLm

🏖️ Benefits

• Competitive compensation, stock options, and benefits. • A flexible remote work culture with global teammates.

Apply Now

Similar Jobs

August 10

Stitch Fix builds AI-powered personal styling. You’ll develop production ML systems for next-gen recommendations, collaborating across teams.

Prometheus

Python

PyTorch

Spark

SQL

Tensorflow

August 9

Work on large scale machine learning projects at Zencastr to improve podcast creation. Collaborate with a talented team to drive innovation in audio and text capabilities.

AWS

Cloud

Docker

Kubernetes

Microservices

MongoDB

Python

PyTorch

Scikit-Learn

SQL

Tensorflow

August 8

10a Labs seeks an ML Engineer to design robust systems for abuse detection at scale.

AWS

Cloud

Google Cloud Platform

Python

August 8

Become a Machine Learning Engineer at Voxel51, innovating AI capabilities on their platform.

ElasticSearch

MongoDB

NoSQL

Numpy

Open Source

Python

PyTorch

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com