Principal Developer, AI Networking

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of NVIDIA

NVIDIA

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

📋 Description

• Characterizing AI workloads and deep learning models aimed at large-scale LLM training and inference on NVIDIA supercomputers. • The role centers on distributed systems with a focus on high-performance networking and NVIDIA communication libraries. • Benchmarking, profiling, and analyzing the performance to find bottlenecks and identify areas for improvement and optimizations, with a strong emphasis on networking aspects. • Developing PyTorch trace-based profiling, analysis, and replaying toolset to aid in benchmarking, debugging, and co-designing network systems for LLM workloads. • Collaborating with multiple teams from hardware to software to provide performance analysis insights. • Defining performance test plans, setting performance expectations for new technologies and solutions, and working to achieve performance targets.

🎯 Requirements

• B.Sc in Computer Science or Software Engineering or equivalent experience. • 15+ years of experience with high-performance networking (RDMA, MPI, NCCL, SHARP). • Demonstrated ability in performance evaluation techniques and approaches. • Experience with NVIDIA GPUs and the CUDA library. • Knowledge of deep learning frameworks like TensorFlow or PyTorch. • Expertise in networking collective communication libraries such as NCCL and protocols like RoCE and RDMA. • Fast and self-learning capabilities with strong analytical and problem-solving skills. • Proficiency in programming languages: Python, Bash, and C++. • Experience with a container-based development environment. • Great teammate who communicates clearly and works well with others.

🏖️ Benefits

• equity • benefits

Apply Now

Similar Jobs

🔥 6 hours ago

Fullsteam

1001 - 5000

💳 Fintech

☁️ SaaS

🤝 B2B

Director of Engineering managing high-quality SaaS delivery at Fullsteam, leveraging AI and engineering excellence for growth and modernization. Leading and developing engineering teams with a focus on predictable outcomes.

🔥 7 hours ago

Jane

1 - 10

🤝 B2B

🚗 Transport

Staff Developer overseeing onboarding experiences at a SaaS company, focusing on activation and retention strategies. Collaborating across domains and utilizing AI for improved workflows.

🔥 7 hours ago

Accenture Federal Services

10,000+ employees

🤖 Artificial Intelligence

🔒 Cybersecurity

🏛️ Government

SAP Fiori Developer leading the design and implementation of user-centric SAP Fiori applications. Collaborating with various teams to ensure business goals and user needs are met.

🔥 22 hours ago

Gainwell Technologies

10,000+ employees

⚕️ Healthcare Insurance

Advisor Batch Developer (UNIX/C/SQL) at Gainwell Technologies focusing on innovative healthcare solutions. Collaborate with teams to enhance technology-driven health services.

🕒 Yesterday

Shield AI

501 - 1000

🤖 Artificial Intelligence

🚀 Aerospace

Directing aerostructures strategy and development for autonomous aircraft at Shield AI. Leading cross-functional teams to innovate in aerospace engineering and robotics.