Principal Software Engineer – Large-Scale LLM Memory and Storage Systems

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

Principal Software Engineer – Large-Scale LLM Memory and Storage Systems

🕒 December 22, 2025

🏄 California, Massachusetts, +1 more states – Remote

💵 $272k - $425.5k / year

⏰ Full Time

🔴 Lead

🧑‍💻 Full-stack Engineer

🦅 H1B Visa Sponsor

Cloud

Distributed Systems

Open Source

Python

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

NVIDIA

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Artificial Intelligence • Gaming • Automotive

📋 Description

• Design and evolve a unified memory layer that spans GPU memory, pinned host memory, RDMA-accessible memory, SSD tiers, and remote file/object/cloud storage to support large-scale LLM inference • Architect and implement deep integrations with leading LLM serving engines (such as vLLM, SGLang, TensorRT-LLM), with a focus on KV-cache offload, reuse, and remote sharing across heterogeneous and disaggregated clusters • Co-design interfaces and protocols that enable disaggregated prefill, peer-to-peer KV-cache sharing, and multi-tier KV-cache storage (GPU, CPU, local disk, and remote memory) for high-throughput, low-latency inference • Partner closely with GPU architecture, networking, and platform teams to exploit GPUDirect, RDMA, NVLink, and similar technologies for low-latency KV-cache access and sharing across heterogeneous accelerators and memory pools • Mentor senior and junior engineers, set technical direction for memory and storage subsystems, and represent the team in internal reviews and external forums (open source, conferences, and customer-facing technical deep dives)

🎯 Requirements

• Masters or PhD or equivalent experience • 15+ years of experience building large-scale distributed systems, high-performance storage, or ML systems infrastructure in C/C++ and Python, with a track record of delivering production services • Deep understanding of memory hierarchies (GPU HBM, host DRAM, SSD, and remote/object storage) and experience designing systems that span multiple tiers for performance and cost efficiency • Distributed caching or key-value systems, especially designs optimized for low latency and high concurrency • Hands-on experience with networked I/O and RDMA/NVMe-oF/NVLink-style technologies, and familiarity with concepts like disaggregated and aggregated deployments for AI clusters • Strong skills in profiling and optimizing systems across CPU, GPU, memory, and network, using metrics to drive architectural decisions and validate improvements in TTFT and throughput • Excellent communication skills and prior experience leading cross-functional efforts with research, product, and customer teams.

🏖️ Benefits

• Equity • Benefits

Apply Now

Similar Jobs

Director, Product Engineering

🕒 December 22, 2025

May Mobility

51 - 200

🚗 Transport

🤖 Artificial Intelligence

Director of Product Engineering at May Mobility overseeing product strategy and development for autonomous vehicles. Collaborating with cross-functional teams to enhance mobility solutions and drive innovation.

🇺🇸 United States – Remote

💵 $160k - $230k / year

⏰ Full Time

🔴 Lead

🧑‍💻 Full-stack Engineer

🦅 H1B Visa Sponsor

Staff Software Engineer

🕒 December 20, 2025

Imply

51 - 200

Staff Software Engineer developing scalable web services and cloud infrastructure for a fast-growing startup. Working with the Platform Engineering team to implement solutions for data observability.

🇺🇸 United States – Remote

💵 $195k - $230k / year

⏰ Full Time

🔴 Lead

🧑‍💻 Full-stack Engineer

🦅 H1B Visa Sponsor

AWS

Azure

Cloud

Google Cloud Platform

Java

Kubernetes

Terraform

Principal Software Engineer, Azure Solutions

🕒 December 19, 2025

TTEC Digital

1001 - 5000

🤖 Artificial Intelligence

🤝 B2B

Azure Principal Software Engineer focusing on client engagement and technical solution design for Azure solutions. Coaching clients to ensure employees feel valued and supported in delivering exceptional customer experiences.

🇺🇸 United States – Remote

💵 $170k - $210k / year

⏰ Full Time

🔴 Lead

🧑‍💻 Full-stack Engineer

🦅 H1B Visa Sponsor

Angular

Azure

Entity Framework

Flutter

Linux

React

React Native

SDLC

.NET

Staff Engineer – Third Party Risk Management

🕒 December 16, 2025

Vanta

201 - 500

📋 Compliance

🔐 Security

☁️ SaaS

Staff Engineer leading Trust Product initiatives at Vanta, focusing on third party risk management and collaborating across teams to achieve business goals.

🇺🇸 United States – Remote

💵 $238k - $280k / year

💰 $40M Series B on 2022-10

⏰ Full Time

🔴 Lead

🧑‍💻 Full-stack Engineer

🦅 H1B Visa Sponsor

Principal Engineer – Salesforce Business Consultant

🕒 December 10, 2025

Nagarro

10,000+ employees

🤝 B2B

🏢 Enterprise

Salesforce Business Consultant responsible for CRM transformation and solution architecture. Join a digital product engineering company focused on manufacturing clients via remote work.

🇺🇸 United States – Remote

⏰ Full Time

🔴 Lead

🧑‍💻 Full-stack Engineer

🦅 H1B Visa Sponsor