Senior Systems Engineer – High-Performance AI, Networking Applications

Job not on LinkedIn

November 10

Apply Now
Logo of NVIDIA

NVIDIA

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

📋 Description

• Collaborate with networking teams to plan, implement, and evaluate performance benchmarks on NVLINK, NVSwitch, and InfiniBand powered infrastructures. • Assess findings and work closely with framework, hardware, and support teams to improve system performance across various deep learning workloads. • Act as a primary resource for fixing networking and hardware integration issues, focusing on scalable multi-node systems. • Maintain high communication standards across multiple engineering, support, and R&D teams, ensuring technical and performance goals are met. • Offer technical mentorship and documentation for internal teams and external partners on standard methodologies in HPC networking deployments. • Share insights on improving networking strategies for substantial AI and deep learning infrastructure.

🎯 Requirements

• BS/MS or PhD in Computer Science, Engineering, or related field, or equivalent experience. • 8+ years of proven experience in AI/HPC Infrastructure. • Familiarity with AI/HPC job schedulers and orchestrators like Slurm, K8s, or LSF. • Practical exposure to AI/HPC workflows employing MPI and NCCL. • Familiarity with High-Speed Networking pertaining to HPC including InfiniBand, RDMA, RoCE, and Amazon EFA. • Essential to have an understanding of PyTorch, MegatronLM, and Deep Learning Inference frameworks such as vllm/sglang. • Proven experience with InfiniBand, NVLINK, and high-speed networking technologies in HPC or large-scale datacenter environments. • Investigating and evaluating performance in multi-node systems, especially in deep learning or scientific computing tasks. • Strong analytical, debugging, and technical communication skills. • Comfortable working in collaborative, multi-faceted teams.

🏖️ Benefits

• equity • benefits

Apply Now

Similar Jobs

November 10

Engineer at Tradeify.co developing real-time data systems for fintech applications. Collaborating on microservices architecture and high-frequency trading solutions in a team environment.

Electron

GRPC

JavaScript

Kafka

Kotlin

Microservices

Node.js

Postgres

Pulsar

React

Redis

Rust

TypeScript

Go

November 10

IBM i Systems Engineer driving project implementations on IBM i Power platforms for seamless customer transitions. Responsible for technical accuracy and stakeholder collaboration.

SQL

November 10

Systems Analyst evaluating existing systems and designing integrations at Veeam for their Corporate Technology team. Collaborating with stakeholders to deliver integration solutions utilizing MuleSoft.

SOAP

SQL

November 9

Mid-Level Systems Engineer focused on ransomware restoration events for Fenix24. Collaborating across technical teams for successful client engagements worldwide.

Switching

VMware

November 9

Senior IT Systems Engineer leading ransomware restoration efforts as part of Fenix24's cybersecurity solutions. Overseeing technical workstreams and client engagement while collaborating with cross-functional teams.

Cloud

Switching

VMware

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com