Senior HPC Solutions Architect

🕒 April 10

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of NVIDIA

NVIDIA

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

📋 Description

• Assisting with deployment, debugging, and improving the efficiency of AI workloads on extensive NVIDIA platforms. • Identifying hardware issues, supervising them through bugs, and keeping customers updated on current progress. • Benchmarking new framework features, analyzing performance, and sharing actionable insights with both customers and internal teams. • Working directly with external customers/partners to solve cluster performance and stability issues, identify bottlenecks, and implement effective solutions. • Build expertise and guide customers in scaling workloads efficiently and reliably on the latest generation of NVIDIA GPUs. • Collaborate with AI factory deployment teams and ensure RAs/Blueprints are accurately followed and implemented.

🎯 Requirements

• BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or other Engineering fields, or equivalent experience. • 10+ years of experience in designing, managing, and supporting large-scale hybrid networks. • Experience with scripting is helpful. • Strong programming skills in at least one of the following languages: C, C++, or Python. • Practical experience identifying and resolving bottlenecks in large-scale training workloads or parallel applications. • Proven understanding of CPU and GPU architectures, CUDA, parallel filesystems, and high-speed interconnects. • Experienced in working with large compute clusters with an understanding of their internal scheduling and resource management mechanisms (e.g. SLURM or Cloud based clusters). • System-level understanding of server/rack-level architecture, BMC, PCIe devices, Network Adapters, Linux OS, and kernel drivers. • Excellent communication and liaison skills to work with customers, partners, and internal functions.

🏖️ Benefits

• Equity and benefits

Apply Now

Similar Jobs

🕒 April 10

CircleCI

201 - 500

☁️ SaaS

Senior Solutions Engineer advising clients on CI/CD solutions for their application development in a SaaS environment. Leading customer engagements to drive implementation and success with CircleCI.

Docker

Linux

Python

🕒 April 10

DigitalOcean

1001 - 5000

☁️ SaaS

Solutions Architect with expertise in cloud infrastructure supporting DigitalOcean's high-value customers. Collaborating with Sales, Support, and Product teams to ensure customer success.

Cloud

Distributed Systems

Docker

Kubernetes

Linux

NFS

PyTorch

Tensorflow

Terraform

🕒 April 10

Tanium

1001 - 5000

🔒 Cybersecurity

🏢 Enterprise

☁️ SaaS

Senior Solution Engineer at Tanium enabling sales through technical expertise. Collaborating with sales to demonstrate solutions for endpoint operations and security.

🕒 April 10

Tanium

1001 - 5000

🔒 Cybersecurity

🏢 Enterprise

☁️ SaaS

Solution Engineer at Tanium, engaging with customers to deliver technical solutions and enhance sales cycles. Collaborating with sales and engineering teams to translate customer needs into actionable solutions.

🕒 April 10

Tanium

1001 - 5000

🔒 Cybersecurity

🏢 Enterprise

☁️ SaaS

Senior Solution Engineer working closely with sales to translate customer needs into technical solutions. Engaging in pre-sales discovery and delivering Proof of Value in high-stakes environments.