
Artificial Intelligence • Gaming • Automotive
NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.
September 2
Ansible
AWS
Azure
Chef
Cloud
Google Cloud Platform
Grafana
Kubernetes
Linux
Microservices
Prometheus
Puppet
Python
Splunk
TCP/IP
Terraform
Go

Artificial Intelligence • Gaming • Automotive
NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.
• NVIDIA DGX Cloud delivering a fully managed AI platform on major cloud providers • Build, implement and support operational and reliability aspects of large-scale Kubernetes clusters with focus on performance at scale, real time monitoring, logging and alerting • Define SLOs/SLIs, monitor error budgets, and streamline reporting • Support services before launch through system creation consulting, developing software tools, platforms and frameworks, capacity management, and launch reviews • Maintain services once live by measuring and monitoring availability, latency and overall system health • Operate and optimize GPU workloads across AWS, GCP, Azure, OCI, and private clouds • Scale systems sustainably through automation and evolve systems to improve reliability and velocity • Lead triage and root-cause analysis of high-severity incidents, perform blameless postmortems • Participate in on-call rotation to support production services
• BS in Computer Science or related technical field, or equivalent experience • 10+ years of experience operating production services • Expert-level knowledge of Kubernetes administration, containerization, and microservices architecture • Experience with infrastructure automation tools (e.g., Terraform, Ansible, Chef, Puppet) • Proficiency in at least one high-level programming language (e.g., Python, Go) • In-depth knowledge of Linux operating systems, networking fundamentals (TCP/IP), and cloud security standards • Proficient knowledge of SRE principles, encompassing SLOs, SLIs, error budgets, and incident handling • Experience building and operating comprehensive observability stacks (OpenTelemetry, Prometheus, Grafana, ELK Stack, Lightstep, Splunk, etc.) • Experience operating GPU workloads and GPU-accelerated clusters (KubeVirt experience is a plus)
Apply NowAugust 28
DevOps Engineer at Saaf Finance builds AI-driven mortgage infrastructure. Designs and maintains AWS-based platforms and CI/CD pipelines.
Airflow
AWS
Cloud
ETL
JavaScript
Kubernetes
Node.js
Prometheus
Python
Terraform
August 27
DevOps Engineer supporting a company building scalable 3D AEC applications. Manage Azure infrastructure, CI/CD, containers, monitoring, and deployment automation.
Azure
Cloud
Docker
Grafana
Kubernetes
Linux
MongoDB
NGINX
Prometheus
Python
RabbitMQ
August 26
DevOps Engineer responsible for CI/CD automation, container orchestration, and cloud tasks.
Ansible
AWS
Docker
Groovy
Java
Jenkins
Kubernetes
Microservices
Node.js
PHP
Python
August 25
Senior Platform Engineer at Zimperium building cloud infrastructure, CI/CD and automation to support mobile security products.
🇮🇳 India – Remote
💰 $12M Venture Round on 2018-11
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
Android
Ansible
AWS
Azure
Cloud
DNS
Docker
ElasticSearch
Google Cloud Platform
iOS
Kubernetes
Linux
Microservices
Oracle
Postgres
Python
Redis
SQL
Terraform
August 20
Sr. Manager leads SRE/DevOps teams at Endpoint, an IRT solutions provider; oversees cloud infrastructure, deployment pipelines, and 24x7 operations.
🇮🇳 India – Remote
💵 ₹2M - ₹4M / year
💰 $1.7M Debt Financing on 2010-03
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
AWS
Azure
Cloud
Java
Linux
Perl
Python
SQL
VMware