Senior MLOps Engineer

Job not on LinkedIn

October 2

Apply Now
Logo of NVIDIA

NVIDIA

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

📋 Description

• Identify infrastructure and software bottlenecks to improve ML job startup time, data load/write time, resiliency, and failure recovery • Translate research workflows into automated, scalable, and reproducible systems that accelerate experimentation • Build CI/CD workflows tailored for ML to support data preparation, model training, validation, deployment, and monitoring • Develop observability frameworks to monitor performance, utilization, and health of large-scale training clusters • Collaborate with hardware and platform teams to optimize models for emerging GPU architectures, interconnects, and storage technologies • Develop guidelines for dataset versioning, experiment tracking, and model governance to ensure reliability and compliance • Mentor and guide engineering and research partners on MLOps patterns, scaling NVIDIA’s impact from research to production • Collaborate with NVIDIA Research teams and the DGX Cloud Customer Success team to enhance MLOps automation continuously

🎯 Requirements

• BS in Computer Science, Information Systems, Computer Engineering or equivalent experience • 8+ years of experience in large-scale software or infrastructure systems, with 5+ years dedicated to ML platforms or MLOps • Proven track record designing and operating ML infrastructure for production training workloads • Expert knowledge of distributed training frameworks (PyTorch, TensorFlow, JAX) and orchestration systems (Kubernetes, Slurm, Kubeflow, Airflow, MLflow) • Strong programming experience in Python plus at least one systems language (Go, C++, Rust) • Deep understanding of GPU scheduling, container orchestration, and cloud-native environments • Experience integrating observability stacks (Prometheus, Grafana, ELK) with ML workloads • Familiarity with storage and data platforms that support large-scale training (object stores, feature stores, versioned datasets) • Strong communication abilities, collaborating effectively with research teams to transform requirements into scalable engineering solutions

🏖️ Benefits

• Equity • Benefits

Apply Now

Similar Jobs

October 2

Experian

10001

🤖 Artificial Intelligence

🤝 B2B

☁️ SaaS

Machine Learning Engineer developing intelligent automation and fraud detection for Experian. Building workflows and integrating LLMs for enhanced client engagement and analytics.

🇺🇸 United States – Remote

💵 $64k - $110.9k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

🤖 Machine Learning Engineer

🦅 H1B Visa Sponsor

September 30

dv01

51 - 200

💸 Finance

💳 Fintech

☁️ SaaS

Build and deploy AI/ML document parsers and classifiers for structured finance. Collaborate across product, engineering, and design at dv01.

🇺🇸 United States – Remote

💵 $145k - $160k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

🤖 Machine Learning Engineer

🦅 H1B Visa Sponsor

September 29

Lime

501 - 1000

🚗 Transport

🛍️ eCommerce

☁️ SaaS

Senior ML Engineer building demand forecasts and vehicle positioning models for Lime's shared e-bikes and scooters. Scale ML systems and collaborate with cross-functional teams.

🇺🇸 United States – Remote

💵 $165k - $227k / year

💰 $418M Convertible Note on 2021-11

⏰ Full Time

🟠 Senior

🤖 Machine Learning Engineer

🦅 H1B Visa Sponsor

September 28

Samsara

1001 - 5000

🏢 Enterprise

🚗 Transport

🔐 Security

Senior ML Engineer building scalable Ray/Kubernetes ML infrastructure and deployment for Samsara's Connected Operations Cloud, optimizing models and supporting ML platform reliability.

🇺🇸 United States – Remote

💵 $135.5k - $227.7k / year

💰 Seed Round on 2014-08

⏰ Full Time

🟠 Senior

🤖 Machine Learning Engineer

🦅 H1B Visa Sponsor

September 24

OneSix

51 - 200

🤖 Artificial Intelligence

Senior ML consultant at OneSix leading design, training, and production deployment of ML models. Mentors teams and shapes project scopes for enterprise AI initiatives.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

🤖 Machine Learning Engineer

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com