
10,000+ employees
Founded 1993
🤖 Artificial Intelligence
🎮 Gaming
Artificial Intelligence • Gaming • Automotive
NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.
🕒 3 days ago
🏄 California, Colorado, +2 more states – Remote
💵 $152k - $241.5k / year
⏰ Full Time
🟠 Senior
🏗️ Platform Engineer
🦅 H1B Visa Sponsor
Improve your chances of getting an interview by checking your resume score before you apply.

10,000+ employees
Founded 1993
🤖 Artificial Intelligence
🎮 Gaming
Artificial Intelligence • Gaming • Automotive
NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.
• Design, build, and maintain our core ML platform infrastructure as code, primarily using Ansible and Terraform, ensuring reproducibility and scalability across large-scale, distributed GPU clusters. • Apply SRE principles to diagnose, troubleshoot, and resolve complex system issues across the entire stack, ensuring high availability and performance for critical AI workloads. • Develop robust internal automation and tooling for ML workflow orchestration, resource scheduling, and platform operations, with a strong focus on software engineering best practices. • Collaborate with ML researchers and applied scientists to understand infrastructure needs and build solutions that streamline their end-to-end experimentation. • Evolve and operate our multi-cloud and hybrid (on-prem + cloud) environments, implementing monitoring, alerting, and incident response protocols. • Participate in on-call rotation to provide support for platform services and infrastructure running critical ML jobs, driving root cause analysis and implementing preventative measures. • Write high-quality, maintainable code (Python, Go) to contribute to the core orchestration platform and automate manual processes. • Drive the adoption of modern GPU technologies and ensure smooth integration of next-generation hardware into ML pipelines (e.g., GB200, NVLink, etc.).
• BS/MS in Computer Science, Engineering, or equivalent experience. • 5+ years in software/platform engineering or SRE roles, including 3+ years focused on ML infrastructure or distributed compute systems. • Strong proficiency in Infrastructure-as-Code (IaC) tools, specifically Ansible and Terraform, with a proven track record of building and managing production infrastructure. • SRE-oriented mindset with extensive experience in diagnosing system-level issues, performance tuning, and ensuring platform reliability. • Solid understanding of ML workflows and lifecycle—from data preprocessing to deployment. • Proficiency in operating containerized workloads with Kubernetes and Docker. • Strong software engineering skills in languages such as Python or Go, with a focus on automation, tooling, and writing production-grade code. • Experience with Linux systems internals, networking, and performance tuning at scale.
• equity • benefits
Apply Now🕒 3 days ago
1001 - 5000
Lead Data Platform Engineer handling the technical architecture for the Enterprise Data Analytics Platform team. Driving large-scale engineering initiatives across the organization while mentoring engineers.
Amazon Redshift
Apache
BigQuery
Cloud
Distributed Systems
Java
Kafka
Python
Scala
Spark
SQL
🕒 3 days ago
Senior Platform Engineer focused on architecting and maintaining Bridgeway's cloud infrastructure. Driving DevOps practices and delivering efficient platform solutions across teams.
AWS
Azure
Cloud
Docker
Firewalls
Python
SDLC
Terraform
🕒 3 days ago
Senior Platform Engineer at neoBIM transforming the construction industry with AI-powered BIM solutions. Focused on infrastructure, system reliability, and CI/CD workflows in a collaborative environment.
AWS
Azure
Cloud
DynamoDB
Google Cloud Platform
Grafana
Linux
MongoDB
MySQL
Postgres
Prometheus
Shell Scripting
Terraform
🕒 4 days ago
Senior Systems & Platform Engineer at MANSCAPED shaping Azure-based platform architecture and enterprise application integrations. Collaborating on cloud strategy and driving critical engineering initiatives.
Azure
Cloud
Python
Terraform
TypeScript
🕒 4 days ago
Platform Engineer building and maintaining infrastructure for engineering teams at Strivacity. Focusing on Kubernetes, automation, and operational excellence in a remote role.
AWS
Flux
Grafana
Kubernetes
Prometheus
Python
Terraform