Senior Systems Software Engineer – GPU Performance

🕒 April 22

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of NVIDIA

NVIDIA

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

📋 Description

• Lead the implementation of performance practices in large-scale GPU infrastructure, delivering powerful tools, methodologies, and flows to validate and improve multiple datacenter products concurrently. • Align next-generation AI workloads with next-generation datacenter builds for NVIDIA GPUs, CPUs, and networking hardware. • Engage early with HW/FW/SW/platform internal and customer teams. • Develop engineering solutions that provide continuous insights into the performance of AI workloads in evolving environments, generating swift insights into improvements and regressions. • Decompose high-complexity performance or stability issues into minimal reproduction cases, working towards identifying the root cause. • Participate in collaborations with various SW and FW teams (BMC/SBIOS/OS/drivers, etc.) to develop outstanding methods and tools. • Analyze, debug, and resolve critical firmware and software issues to achieve the highest AI workload performance at scale.

🎯 Requirements

• Proven understanding of accelerated computing software stacks (CUDA). • Experience with modern cloud and container-based enterprise computing architectures, with Slurm preferred. • Strong programming and scripting experience in C/C++/Python/Bash. • Deep expertise in systems architecture and the impact of various components on performance. • Experience with container technology and Linux-based OSes, with Docker preferred. • Experience supporting high-performance computing or deep learning in engineering or academic research communities. • Strong teamwork and communication skills, coupled with results-focused analytical abilities. • BS in Engineering, Mathematics, Physics, or Computer Science (or equivalent experience); MS or PhD desirable with 8+ years of applicable experience.

🏖️ Benefits

• equity • benefits

Apply Now

Similar Jobs

🕒 April 22

Software Engineer II working on a collaborative web platform for Pantheon. Involves full-stack feature development and team collaboration on a founding-stage product.

AWS

Azure

Cloud

Google Cloud Platform

GraphQL

JavaScript

TypeScript

🕒 April 22

Senior Software Engineer developing the next generation of collaborative web content infrastructure at Pantheon. Leading projects and mentoring engineers while building backend services, APIs, and frontend systems.

AWS

Azure

Cloud

Google Cloud Platform

GraphQL

🕒 April 22

Hightouch

51 - 200

☁️ SaaS

Software Engineer managing distributed systems at Hightouch, enabling marketing workflows through powerful data syncing solutions. Collaborating with customers and teams on high-scale systems.

Distributed Systems

🕒 April 22

Grafana Labs

501 - 1000

🏢 Enterprise

☁️ SaaS

🤖 Artificial Intelligence

Senior Fullstack Engineer building observability features for Grafana Labs. Collaborating on backend and frontend systems for Real User Monitoring initiatives.

AWS

Azure

Cassandra

Cloud

Distributed Systems

Docker

Google Cloud Platform

Kafka

Kubernetes

Postgres

React

TypeScript

Go

🕒 April 22

ClickHouse

51 - 200

☁️ SaaS

🏢 Enterprise

🤖 Artificial Intelligence

Senior Software Engineer developing the Java client and JDBC driver for ClickHouse. Engaging with Connectors team and external partners to enhance data integration and performance.

Java

SQL