Senior Systems Engineer – High-Performance AI, Networking Applications

Job not on LinkedIn

November 10

Apply Now
Logo of NVIDIA

NVIDIA

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

📋 Description

• Collaborate with networking teams to plan, implement, and evaluate performance benchmarks on NVLINK, NVSwitch, and InfiniBand powered infrastructures. • Assess findings and work closely with framework, hardware, and support teams to improve system performance across various deep learning workloads. • Act as a primary resource for fixing networking and hardware integration issues, focusing on scalable multi-node systems. • Maintain high communication standards across multiple engineering, support, and R&D teams, ensuring technical and performance goals are met. • Offer technical mentorship and documentation for internal teams and external partners on standard methodologies in HPC networking deployments. • Share insights on improving networking strategies for substantial AI and deep learning infrastructure.

🎯 Requirements

• BS/MS or PhD in Computer Science, Engineering, or related field, or equivalent experience. • 8+ years of proven experience in AI/HPC Infrastructure. • Familiarity with AI/HPC job schedulers and orchestrators like Slurm, K8s, or LSF. • Practical exposure to AI/HPC workflows employing MPI and NCCL. • Familiarity with High-Speed Networking pertaining to HPC including InfiniBand, RDMA, RoCE, and Amazon EFA. • Essential to have an understanding of PyTorch, MegatronLM, and Deep Learning Inference frameworks such as vllm/sglang. • Proven experience with InfiniBand, NVLINK, and high-speed networking technologies in HPC or large-scale datacenter environments. • Investigating and evaluating performance in multi-node systems, especially in deep learning or scientific computing tasks. • Strong analytical, debugging, and technical communication skills. • Comfortable working in collaborative, multi-faceted teams.

🏖️ Benefits

• equity • benefits

Apply Now

Similar Jobs

November 10

Tradeify

11 - 50

👥 B2C

💳 Fintech

🛍️ eCommerce

Engineer at Tradeify.co developing real-time data systems for fintech applications. Collaborating on microservices architecture and high-frequency trading solutions in a team environment.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

⚙️ Systems Engineer

November 10

Service Express

1001 - 5000

🏢 Enterprise

☁️ SaaS

📡 Telecommunications

IBM i Systems Engineer driving project implementations on IBM i Power platforms for seamless customer transitions. Responsible for technical accuracy and stakeholder collaboration.

🇺🇸 United States – Remote

💵 $120k - $140k / year

💰 Private Equity Round on 2015-07

⏰ Full Time

🟡 Mid-level

🟠 Senior

⚙️ Systems Engineer

November 10

Veeam Software

1001 - 5000

☁️ SaaS

🔒 Cybersecurity

🏢 Enterprise

Systems Analyst evaluating existing systems and designing integrations at Veeam for their Corporate Technology team. Collaborating with stakeholders to deliver integration solutions utilizing MuleSoft.

🇺🇸 United States – Remote

💵 $91k - $130k / year

💰 $500M Private Equity Round on 2019-01

⏰ Full Time

🟡 Mid-level

🟠 Senior

⚙️ Systems Engineer

🦅 H1B Visa Sponsor

November 9

appNovi, Inc (A Fenix24 Company)

1 - 10

🔒 Cybersecurity

☁️ SaaS

🤝 B2B

Mid-Level Systems Engineer focused on ransomware restoration events for Fenix24. Collaborating across technical teams for successful client engagements worldwide.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⚙️ Systems Engineer

Switching

VMware

November 9

appNovi, Inc (A Fenix24 Company)

1 - 10

🔒 Cybersecurity

☁️ SaaS

🤝 B2B

Senior IT Systems Engineer leading ransomware restoration efforts as part of Fenix24's cybersecurity solutions. Overseeing technical workstreams and client engagement while collaborating with cross-functional teams.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

⚙️ Systems Engineer

Cloud

Switching

VMware

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com