ML Ops Infrastructure Engineer

🕒 April 6

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Deepgram

Deepgram

51 - 200 employees

Founded 2015

🤖 Artificial Intelligence

☁️ SaaS

🔌 API

💰 $47M Series B on 2022-11

Artificial Intelligence • SaaS • API

Deepgram is a leading voice AI company that provides powerful APIs for speech-to-text, text-to-speech, and language understanding applications. Their platform enables developers to build sophisticated voice AI solutions for use cases such as contact centers, medical transcription, conversational AI, and more. Known for unmatched accuracy, speed, and cost-effectiveness, Deepgram's technology is trusted by top enterprises and startups worldwide. By offering real-time and highly accurate transcription capabilities, Deepgram helps businesses gain insights from voice data, making it an essential tool for transforming voice interactions.

📋 Description

• Design and build CI/CD pipelines specifically tailored for ML model development, validation, and deployment • Architect and maintain model deployment pipelines that move models from research environments through staging to production with confidence • Build A/B testing infrastructure that enables controlled rollouts of new models and measures real-world performance impact • Implement comprehensive monitoring for model performance in production -- accuracy metrics, latency, drift detection, and regression alerts • Develop automated retraining pipelines that trigger on data changes, performance degradation, or scheduled cadences • Create and maintain build and test environments that mirror production, giving researchers high-fidelity feedback before deployment • Establish model versioning, artifact management, and rollback capabilities to ensure safe and reproducible deployments • Collaborate with research engineers to define and enforce model quality gates before production promotion • Build observability dashboards that give the team real-time insight into model health across all environments • Optimize model serving infrastructure for latency, throughput, and cost efficiency

🎯 Requirements

• 4+ years of experience in MLOps, DevOps, or infrastructure engineering with a focus on ML systems • Strong proficiency in Python and experience building automation and tooling for ML workflows • Deep experience with CI/CD systems and building pipelines for software and model delivery • Hands-on experience with Docker and Kubernetes for containerized workload management • Practical experience deploying and serving ML models in production environments • Familiarity with model evaluation, validation, and quality assurance processes • Understanding of monitoring and observability principles as applied to ML systems • Strong problem-solving skills and a bias toward automation over manual processes

🏖️ Benefits

• Medical, dental, vision benefits • Annual wellness stipend • Mental health support • Life, STD, LTD Income Insurance Plans • Unlimited PTO • Generous paid parental leave • Flexible schedule • 12 Paid US company holidays • Quarterly personal productivity stipend • One-time stipend for home office upgrades • 401(k) plan with company match • Tax Savings Programs • Learning / Education stipend • Participation in talks and conferences • Employee Resource Groups • AI enablement workshops / sessions

Apply Now

Similar Jobs

🕒 April 3

Hatch

5001 - 10000

⚡ Energy

☁️ SaaS

Senior Cloud Infrastructure Engineer building resilient infrastructure for AI products at Hatch. Collaborating with engineers to ensure systems scale with business ambitions.

Ansible

AWS

Cloud

Distributed Systems

Erlang

Google Cloud Platform

Grafana

Kubernetes

Prometheus

Python

Rust

Terraform

Go

🕒 April 3

Managed Services Center Infrastructure Engineer at ePlus providing Network, Security, and Systems support. Ensuring stability, availability, security, and functionality in a production environment.

iOS

Linux

🕒 April 3

Mechanical Orchard

11 - 50

🤖 Artificial Intelligence

☁️ SaaS

🏢 Enterprise

Manager leading infrastructure engineering delivery for Mechanical Orchard, ensuring effective deployment models and team development. Collaborating across functions to influence key architectural decisions.

Cloud

🕒 April 2

Valon

51 - 200

💸 Finance

💳 Fintech

🏠 Real Estate

Senior Software Engineer developing and operating core cloud infrastructure for Valon, a fintech innovator. Focus on reliability, scalability, and security for enterprise deployments.

AWS

Azure

Cloud

Distributed Systems

Docker

Google Cloud Platform

Kubernetes

Redis

Terraform

🕒 April 1

MLabs

51 - 200

Senior Infrastructure Engineer building internal platforms for a venture-backed financial technology firm. Designing reliable infrastructure to empower product teams for seamless deployment.

AWS

Cloud

Distributed Systems

Kubernetes

TypeScript