Senior Systems Software Engineer – GPU Performance

🕒 April 22

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of NVIDIA

NVIDIA

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

📋 Description

• Lead the implementation of performance practices in large-scale GPU infrastructure, delivering powerful tools, methodologies, and flows to validate and improve multiple datacenter products concurrently. • Align next-generation AI workloads with next-generation datacenter builds for NVIDIA GPUs, CPUs, and networking hardware. • Engage early with HW/FW/SW/platform internal and customer teams. • Develop engineering solutions that provide continuous insights into the performance of AI workloads in evolving environments, generating swift insights into improvements and regressions. • Decompose high-complexity performance or stability issues into minimal reproduction cases, working towards identifying the root cause. • Participate in collaborations with various SW and FW teams (BMC/SBIOS/OS/drivers, etc.) to develop outstanding methods and tools. • Analyze, debug, and resolve critical firmware and software issues to achieve the highest AI workload performance at scale.

🎯 Requirements

• Proven understanding of accelerated computing software stacks (CUDA). • Experience with modern cloud and container-based enterprise computing architectures, with Slurm preferred. • Strong programming and scripting experience in C/C++/Python/Bash. • Deep expertise in systems architecture and the impact of various components on performance. • Experience with container technology and Linux-based OSes, with Docker preferred. • Experience supporting high-performance computing or deep learning in engineering or academic research communities. • Strong teamwork and communication skills, coupled with results-focused analytical abilities. • BS in Engineering, Mathematics, Physics, or Computer Science (or equivalent experience); MS or PhD desirable with 8+ years of applicable experience.

🏖️ Benefits

• equity • benefits

Apply Now

Similar Jobs

🕒 April 22

Software Engineer II working on a collaborative web platform for Pantheon. Involves full-stack feature development and team collaboration on a founding-stage product.

🕒 April 22

Senior Software Engineer developing the next generation of collaborative web content infrastructure at Pantheon. Leading projects and mentoring engineers while building backend services, APIs, and frontend systems.

🕒 April 22

Hightouch

51 - 200

☁️ SaaS

Software Engineer managing distributed systems at Hightouch, enabling marketing workflows through powerful data syncing solutions. Collaborating with customers and teams on high-scale systems.

🕒 April 22

Grafana Labs

501 - 1000

🏢 Enterprise

☁️ SaaS

🤖 Artificial Intelligence

Senior Fullstack Engineer building observability features for Grafana Labs. Collaborating on backend and frontend systems for Real User Monitoring initiatives.

🕒 April 22

ClickHouse

51 - 200

☁️ SaaS

🏢 Enterprise

🤖 Artificial Intelligence

Senior Software Engineer developing the Java client and JDBC driver for ClickHouse. Engaging with Connectors team and external partners to enhance data integration and performance.