Cloud Performance Engineer

Job not on LinkedIn

October 22

Apply Now
Logo of NVIDIA

NVIDIA

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

📋 Description

• Develop benchmarks, end to end customer applications running at scale, instrumented for performance measurements, tracking, sampling, to measure and optimize performance of important applications and services; • Construct carefully designed experiments to analyze, study and develop critical insights into performance bottlenecks, dependencies, from an end to end perspective; • Develop ideas on how to improve the end to end system performance and usability by driving changes in the HW or SW (or both). • Collaborate with AI researchers, developers, and application service providers to understand internal developer and external customer pain points, requirements, project future needs and share best practice. • Develop the necessary modeling framework and the TCO (total cost of ownership) analysis to enable efficient exploration and sweep of the architecture and design space • Develop the methodology needed to drive the engineering analysis to Inform the architecture, design and roadmap of DGX Cloud

🎯 Requirements

• Expertise in working with large scale parallel and distributed accelerator-based systems • Expertise optimizing performance and AI workloads on large scale systems • Experience with performance modeling and benchmarking at scale • Strong background in Computer Architecture, Networking, Storage systems, Accelerators • Familiarity with popular AI frameworks (PyTorch, TensorFlow, JAX, Megatron-LM, Tensort-LLM, VLLM) among others • Experience with AI/ML models and workloads, in particular LLMs • understanding of DNNs and their use in emerging AI/ML applications and services • Bachelors/Masters in Engineering or equivalent experience (preferably, Electrical Engineering, Computer Engineering, or Computer Science) • 10 years experience in the above areas • Proficiency in Python, C/C++ • Expertise with at least one of public CSP infrastructure (GCP, AWS, Azure, OCI, …)

🏖️ Benefits

• equity • benefits

Apply Now

Similar Jobs

October 22

Process Engineer supporting engineering, process development and technology transfer with 100% travel to client sites across the United States. Facilitating compliance and improving manufacturing processes.

October 22

Engineer II, Field Service role involves assembling, testing, and maintaining electro-mechanical systems at Thermo Fisher Scientific. Commitment to customer satisfaction and quality service delivery is essential.

Assembly

October 22

Sr. MS SQL Reporting Engineer at Point & Pay translating business requirements into SQL solutions. Responsible for database performance and monitoring as well as team guidance on technical issues.

MS SQL Server

MySQL

SOAP

SQL

October 22

Senior Engineer responsible for developing IAM solutions at Valon, a Series C company revolutionizing mortgage servicing with AI. Collaborate with cross-functional teams to establish core identity systems.

October 22

Principal Services Engineer at teamLFG driving development of game-facing services. Collaborating with a diverse team to turn creative dreams into new franchises at PlayStation.

Distributed Systems

Rust

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com