Senior ML Platform Engineer

Job not on LinkedIn

November 4

Apply Now
Logo of NVIDIA

NVIDIA

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

📋 Description

• Design, build, and maintain our core ML platform infrastructure as code, primarily using Ansible and Terraform • Apply SRE principles to diagnose, troubleshoot, and resolve complex system issues across the entire stack • Develop robust internal automation and tooling for ML workflow orchestration, resource scheduling, and platform operations • Collaborate with ML researchers and applied scientists to understand infrastructure needs • Evolve and operate our multi-cloud and hybrid (on-prem + cloud) environments • Participate in on-call rotation to provide support for platform services and infrastructure • Write high-quality, maintainable code (Python, Go) to contribute to the core orchestration platform • Drive the adoption of modern GPU technologies and ensure smooth integration of next-generation hardware into ML pipelines.

🎯 Requirements

• BS/MS in Computer Science, Engineering, or equivalent experience • 8+ years in software/platform engineering or SRE roles, including 3+ years focused on ML infrastructure or distributed compute systems • Strong proficiency in Infrastructure-as-Code (IaC) tools, specifically Ansible and Terraform • SRE-oriented mindset with extensive experience in diagnosing system-level issues, performance tuning, and ensuring platform reliability • Solid understanding of ML workflows and lifecycle—from data preprocessing to deployment • Proficiency in operating containerized workloads with Kubernetes and Docker • Strong software engineering skills in languages such as Python or Go • Experience with Linux systems internals, networking, and performance tuning at scale.

🏖️ Benefits

• equity • benefits

Apply Now

Similar Jobs

November 4

Senior Platform Engineer responsible for orchestrating MLOps workflows and managing AI infrastructures. Join Quantiphi's team to leverage expertise in machine learning and cloud technologies.

AWS

Azure

Cloud

Docker

Google Cloud Platform

Jenkins

Kubernetes

Linux

Python

PyTorch

Tensorflow

Terraform

October 31

Senior Platform Engineer designing and operating systems for ad-serving and streaming platforms at Wurl. Focusing on building infrastructure for high-throughput, low-latency workloads.

AWS

Cloud

Distributed Systems

Grafana

Kafka

Kubernetes

Prometheus

Python

Terraform

TypeScript

Go

October 31

Power Platform Developer working remotely to enhance business operations with Microsoft tools. Focused on workflow automation and data-driven HR solutions using Power Platform tools.

October 31

Senior Fullstack Developer responsible for leading development and design of web applications in an agile environment for corporate projects.

Angular

Azure

JavaScript

jQuery

SQL

October 30

Senior Automation Platform Engineer designing and delivering automations using Workato for HubSpot's employee experience. Join the Intelligent Automation team to connect critical internal systems.

Java

Python

Ruby

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com