Senior Systems Software Engineer, Kubernetes Scale

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of NVIDIA

NVIDIA

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

📋 Description

• Drive end-to-end performance and scale characterization for the NVIDIA DGX Cloud software stack • Collaborate with AI researchers to develop automated tests simulating real user workloads • Investigate and resolve root causes of performance issues in distributed systems • Design monitoring and analysis tools for performance testing across software, GPU, and CPU resources • Triage, debug and root cause issues related to operating Kubernetes clusters at ultra-large scale • Build and maintain CI/CD framework for continuous performance testing • Document research methodologies and present findings at conferences • Engage with upstream communities to validate performance of AI workloads

🎯 Requirements

• 8+ years of experience • Expertise in Kubernetes and familiarity with related CNCF projects • Background in Computer Architecture, Networking, Storage systems, Accelerators • Bachelor's/Master's in Engineering (Electrical Engineering, Computer Engineering, or Computer Science) • Experience optimizing performance and AI workloads on large scale systems • Proficiency in Golang/Python • Expertise with at least one of public CSP infrastructure (GCP, AWS, Azure, OCI)

🏖️ Benefits

• Health insurance • Retirement plans • Paid time off • Flexible work arrangements • Professional development

Apply Now

Similar Jobs

🕒 June 16

Callstack

51 - 200

🤝 B2B

Senior AI System Engineer consulting on AI solutions for enterprise clients. Responsibilities include designing AI architectures and building infrastructures across cloud and on-prem.

AWS

Cloud

Google Cloud Platform

Python

TypeScript

🕒 May 15

Vecima Networks Inc.

501 - 1000

📡 Telecommunications

🔧 Hardware

📱 Media

System Engineer supporting development teams with infrastructure and automation tasks. Working with production infrastructure, CI/CD, and automating deployment processes in DevOps.

🗣️🇵🇱 Polish Required

Ansible

Docker

Kubernetes

Linux

Python

Terraform

Go

🕒 May 14

Dropbox

1001 - 5000

🏢 Enterprise

⚡ Productivity

Senior Data Engineer at Dropbox responsible for building data pipelines and managing data architecture in a remote setting. Collaborating with cross-functional teams to enhance data visibility and governance.

Apache

ETL

Python

ServiceNow

Spark

SQL

🕒 May 5

Centuria

201 - 500

🚀 Aerospace

🔒 Cybersecurity

🏛️ Government

Mid Linux & Cloud Systems Engineer maintaining IT infrastructure stability and security for clients like Castorama and Medicover. Collaborating with SysOps and DevOps experts in remote settings.

🗣️🇵🇱 Polish Required

Ansible

AWS

Azure

Cloud

Google Cloud Platform

Grafana

Linux

Prometheus

Puppet

TCP/IP

Terraform