
10,000+ employees
Founded 1993
🤖 Artificial Intelligence
🎮 Gaming
Artificial Intelligence • Gaming • Automotive
NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.
🕒 May 14
🏄 California – Remote
💵 $168k - $270.3k / year
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor
Improve your chances of getting an interview by checking your resume score before you apply.

10,000+ employees
Founded 1993
🤖 Artificial Intelligence
🎮 Gaming
Artificial Intelligence • Gaming • Automotive
NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.
• Design, implement and support operational and reliability aspects of large scale Observability & Telemetry collection platform with a focus on performance at scale, real time monitoring, logging and alerting • Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation and refinement • Support services before they go live through activities such as system design consulting, developing software tools, platforms and frameworks, capacity management and launch reviews • Maintain services once they are live by measuring and monitoring availability, latency and overall system health • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity • Practice sustainable incident response and blameless postmortems • Be part of an on call rotation to support production systems
• BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent experience • 8+ years of experience with Infrastructure automation, distributed systems design, experience with design, develop tools for running large scale private or public cloud system in Production • 5+ years experience delivering foundational infrastructure and observability platforms. • Experience in one or more of the following: Python, Go, Perl or Ruby. • In depth knowledge on Linux, Networking and Containers
• equity • benefits
Apply Now🕒 May 14
Senior DevOps Engineer joining NetBox Labs Cloud Delivery team to enhance AWS infrastructure. Leading projects and mentorship within a fast-paced DevOps environment.
🇺🇸 United States – Remote
💵 $165k - $185k / year
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
AWS
Cloud
Grafana
Kubernetes
Prometheus
Python
Shell Scripting
Terraform
Go
🕒 May 14
Lead Engineer overseeing Launch Potato's cloud infrastructure and SRE function. Evolving CI/CD platform, compliance posture, and leading AWS multi-account migration.
AWS
Cloud
Microservices
Terraform
🕒 May 14
Lead DevOps/SRE Engineer evolving cloud infrastructure at Launch Potato. Building an SRE function to enable faster shipping of products while maintaining reliability and cost control.
AWS
Cloud
Grafana
Microservices
Terraform
🕒 May 14
Lead SRE/DevOps Engineer at Launch Potato evolving cloud infrastructure and CI/CD platform. Owning SRE function development for faster product team performance without compromising reliability or security.
AWS
Cloud
Grafana
Microservices
Terraform
🕒 May 14
Senior DevOps/Observability Engineer leading unified observability platform design for Fortune 500 clients. Focused on architecting observability pipeline using AWS and modern open-source tools.
🇺🇸 United States – Remote
💰 Series A on 2019-12
⏰ Full Time
🟠 Senior
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor
AWS
Grafana
Kubernetes
Prometheus
Splunk
Terraform