Senior Software Developer, HPC Cluster Management

September 30

Apply Now
Logo of NVIDIA

NVIDIA

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

📋 Description

• Development of the head node and compute node installation and provisioning processes • Work on functionality in the area of edge site deployment • Integrating product with latest hardware (GPUs, DPUs, accelerators, high-speed interconnects such as Infiniband) • Develop new features in firmware management and network configuration for existing and next generation of Nvidia platforms • Develop functionality to make Bright clusters usable for a wider range of workloads and increase scalability to huge number of nodes • Adding support for new Linux distributions • Improving support for alternative CPU architectures such as ARM • Add features to Ansible collections for Cluster Installation and Management • Assist support team with customer support requests and help customers use product more efficiently • Work within NVIDIA's Base Command Manager (BCM) environment powering thousands of Linux clusters on-premises, cloud, or hybrid

🎯 Requirements

• Degree in Computer Science or related field (or equivalent experience) • 7+ years of experience in software development and/or related roles • Very familiar with the Linux operating system • Strong knowledge of networking concepts in Linux • Practical knowledge of common software installed in typical Linux installations • Proficient in Python • Intimately familiar with object oriented software design, design patterns, and concurrent programming techniques • Emphasis on high quality of work and producing clean code • Eager to learn and use new technologies • (Nice to have) Experience with Ansible • (Nice to have) Experience with high-performance computing and system administration • (Nice to have) Knowledge of Kubernetes, AWS, Azure, GCE, OpenStack, Jenkins and distributed programming • (Nice to have) Proficiency in C++

Apply Now

Similar Jobs

September 30

Senior engineer improving Firefox OS integration and enterprise deployment tools at Mozilla. Develop cross-platform desktop features, updates, installs, and admin/monitoring for large deployments.

JavaScript

Linux

MacOS

Open Source

Rust

July 12

Join ClickHouse to build interfaces and dashboards for our cloud platform, ensuring reliability and security.

AWS

Azure

Cloud

Distributed Systems

Google Cloud Platform

JavaScript

Node.js

React

SQL

TypeScript

July 4

As a Fullstack Software Engineer, you'll design and develop AI-driven applications at Dataiku.

Angular

Cloud

Flask

JavaScript

Open Source

Python

React

Vue.js

June 27

As a Tech Lead, determine technical direction and solve complex IT challenges for KPN.

🗣️🇳🇱 Dutch Required

Azure

Informatica

SQL

Vault

.NET

May 1

As Tech Lead, drive the technical direction in a DevOps team at KPN, ensuring robust solutions.

🗣️🇳🇱 Dutch Required

Azure

Informatica

SQL

Vault

.NET

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com