Senior Machine Learning Infrastructure Engineer - DGX Cloud

July 10

Apply Now
Logo of NVIDIA

NVIDIA

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

📋 Description

• Scale up AI Infrastructure at NVIDIA • Contribute to automation of datacenter operations • Implement monitoring and health management capabilities • Build automated test infrastructure • Ensure seamless software integration across engineering teams • Constantly innovate and discover new solutions

🎯 Requirements

• 5+ years of software engineering experience on large-scale production systems • BS in Computer Science/Engineering/Physics/Mathematics or other comparable Degree or equivalent experience • Expert level knowledge of a systems programming language (Go, Python) • Strong background of Linux system administration and management • Background with cluster management systems (Kubernetes, SLURM) • Understanding of performance, security and reliability in complex distributed systems • Familiarity with system level architecture, data synchronization, fault tolerance and state management

🏖️ Benefits

• equity • benefits

Apply Now

Similar Jobs

June 1

Join our cloud engineering team to modernize systems and build resilient infrastructures for clients.

AWS

Azure

Cloud

Docker

Google Cloud Platform

Kubernetes

Node.js

Prometheus

Terraform

May 20

Join Voltage Park's Infrastructure Engineering team to build automation for AI/ML training and HPC workloads.

Firewalls

Linux

Python

May 9

Join Zelis as an Infrastructure Engineer to scale cloud infrastructure supporting healthcare financial solutions.

AWS

Cloud

DNS

EC2

Jenkins

Python

Ruby

Terraform

May 5

Drive the architecture of ML data systems for autonomous vehicles at Motional.

AWS

Azure

Cloud

Distributed Systems

Google Cloud Platform

Linux

Open Source

Python

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com