Senior MLOps Engineer, GenAI Framework

Job not on LinkedIn

November 14

Apply Now
Logo of NVIDIA

NVIDIA

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

📋 Description

• Architect and manage the continuous integration pipelines and release processes of our Generative AI framework and libraries related to Megatron-LM and NeMo Framework. • Design and implement efficient and scalable DevOps solutions to allow our fast growing team to release software more frequently while maintaining high-quality and maximum performance. • Work with industry standard tools (Kubernetes, Docker, Slurm, Ansible, GitLab, GitHub Actions, Jenkins, Artifactory, Jira) in hybrid on-premise and cloud environments. • Assist with cluster operations and system administration (managing: servers, team accounts, clusters). • Accelerate research and development cycles by automating recurring tasks such as accuracy and performance regression detection. • Developing new quality control measures, e.g. code analysis, backwards compatibility, and regression testing, while employing and advancing best-practices. • Work closely with DL frameworks and libraries (CUDA, cuDNN, cuBLAS, and PyTorch) teams and with other engineering teams within NVIDIA that provide software, testing, and release related infrastructure.

🎯 Requirements

• BS or MS degree in Computer Science, Computer Architecture or related technical field (or equivalent experience) and 3+ years of industry experience in DevOps and infrastructure engineering. • Strong system level programming in languages like Python and shell scripting. • Extensive understanding of build/release systems, CI/CD and experience with solutions like Gitlab, Github, Jenkins etc. • Experience with Linux system administration. • Proficient with containerization and cluster management technologies like Docker and Kubernetes. • Experience in build tools, including Make, Cmake. • A strong background in source code management (SCM) solutions such as GitLab, GitHub, Perforce, etc. • Well-versed problem-solving and debugging skills. • Great teammate who can collaborate and influence others in a dynamic environment. • Excellent interpersonal and written communication skills.

🏖️ Benefits

• equity • benefits

Apply Now

Similar Jobs

November 13

Bot Auto

11 - 50

🚗 Transport

🤖 Artificial Intelligence

⚡ Energy

Senior Machine Learning Engineer exploring deep learning and autonomous driving solutions at Bot Auto. Collaborating across teams to innovate and develop machine learning technologies.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

🤖 Machine Learning Engineer

November 12

Reddit, Inc.

501 - 1000

👥 B2C

📱 Media

🌍 Social Impact

Senior Machine Learning Engineer building user experiences on Reddit using ML and LLMs. Collaborate with product and design teams to improve user engagement and retention.

🇺🇸 United States – Remote

💵 $216.7k - $303.4k / year

⏰ Full Time

🟠 Senior

🤖 Machine Learning Engineer

November 11

Affinity.co

201 - 500

🤖 Artificial Intelligence

🤝 B2B

💸 Finance

Senior Machine Learning Engineer designing and building AI systems for Affinity's relationship intelligence platform. Collaborating with cross-functional teams to shape the future of private capital's CRM platform.

🇺🇸 United States – Remote

💵 $106.2k - $210k / year

⏰ Full Time

🟠 Senior

🤖 Machine Learning Engineer

Azure

Neo4j

Python

PyTorch

Scikit-Learn

SQL

November 7

Spotify

5001 - 10000

📱 Media

👥 B2C

🛍️ eCommerce

Senior Machine Learning Engineer in charge of building and improving ML models for Spotify's personalization features. Collaborating with teams to enhance user satisfaction through recommendation systems.

🇺🇸 United States – Remote

💵 $176.2k - $251.7k / year

⏰ Full Time

🟠 Senior

🤖 Machine Learning Engineer

🦅 H1B Visa Sponsor

November 7

vidIQ

51 - 200

🤝 B2B

📱 Media

⚡ Productivity

AI/ML Engineer driving product innovation and delivering high-impact results for vidIQ. Collaborating with cross-functional teams to establish architecture and implement machine learning solutions.

🇺🇸 United States – Remote

💵 $130k - $160k / year

⏰ Full Time

🟠 Senior

🤖 Machine Learning Engineer

🦅 H1B Visa Sponsor

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com