Senior Deep Learning Engineer – Autonomous Vehicles

Job not on LinkedIn

October 2

Apply Now
Logo of NVIDIA

NVIDIA

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

📋 Description

• Crafting, scaling, and hardening deep learning infrastructure libraries and frameworks for training on multi-thousand GPU clusters. • Improving efficiency throughout the training stack: data loaders, distributed training, scheduling, and performance monitoring. • Building robust training pipelines and libraries to handle massive video datasets and enable rapid experimentation. • Collaborating with researchers, model engineers, and internal platform teams to enhance efficiency, minimize stalls, and improve training availability. • Owning core infrastructure components such as orchestration libraries, distributed training frameworks, and fault-resilient training systems. • Partnering with leadership to ensure infrastructure scales with growing GPU capacity and dataset size while maintaining developer efficiency and stability.

🎯 Requirements

• BS, MS, or PhD in Computer Science, Electrical/Computer Engineering, or a related field, or equivalent experience. • 12+ years of professional experience building and scaling high-performance distributed systems, ideally in ML, HPC, or large-scale data infrastructure. • Extensive knowledge in deep learning frameworks (PyTorch is preferred), large scale training (DDP/FSDP, NCCL, tensor/pipeline parallelism), and performance profiling. • Strong systems background: datacenter networking (RoCE, IB), parallel filesystems (Lustre), storage systems, schedulers (Slurm, Kubernetes, etc.). • Proficiency in Python and C++, with experience writing production-grade libraries, orchestration layers, and automation tools. • Ability to work closely with multi-functional teams (ML researchers, infra engineers, product leads) and translate requirements into robust systems.

🏖️ Benefits

• equity • benefits

Apply Now

Similar Jobs

October 2

Machine Learning Engineer developing intelligent automation and fraud detection for Experian. Building workflows and integrating LLMs for enhanced client engagement and analytics.

AWS

Docker

Kubernetes

Microservices

Python

September 30

Build and deploy AI/ML document parsers and classifiers for structured finance. Collaborate across product, engineering, and design at dv01.

BigQuery

Cloud

Docker

Flask

Google Cloud Platform

Kubernetes

Python

PyTorch

SQL

Tensorflow

September 29

Senior ML Engineer building demand forecasts and vehicle positioning models for Lime's shared e-bikes and scooters. Scale ML systems and collaborate with cross-functional teams.

Pandas

Python

PyTorch

Spark

SQL

Tensorflow

September 28

Senior ML Engineer building scalable Ray/Kubernetes ML infrastructure and deployment for Samsara's Connected Operations Cloud, optimizing models and supporting ML platform reliability.

Java

Kubernetes

Python

PyTorch

Ray

Scala

Spark

Tensorflow

Go

September 24

Senior ML consultant at OneSix leading design, training, and production deployment of ML models. Mentors teams and shapes project scopes for enterprise AI initiatives.

AWS

Azure

Cloud

Google Cloud Platform

Python

PyTorch

Scikit-Learn

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com