Senior Systems Engineer, Storage – DGX Cloud

🔥 8 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of NVIDIA

NVIDIA

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

📋 Description

• Design, deploy, and operate solutions on Kubernetes for large-scale storage and data platforms, including the manifests, Helm charts, and operators that run them. • Build tools, services, and automation that improve the lifecycle of storage and data systems – from provisioning and configuration through deployment, scaling, and day-2 operations. • Develop and operate telemetry and observability for production systems – metrics, logging, tracing, dashboards, and alerting – so that system health, availability, and latency are measurable and actionable. • Apply strong analytical troubleshooting skills to diagnose and resolve complex issues across distributed, containerized infrastructure. • Work closely with peers and partner teams to improve the lifecycle of services, from inception and design through deployment, operation, and refinement. • Scale systems sustainably through automation, infrastructure-as-code, and CI/CD, and evolve systems by pushing for changes that improve reliability and velocity. • Support services before they go live through activities such as deployment automation, capacity planning, and launch and readiness reviews. • Practice sustainable incident response and postmortems, and participate in an on-call rotation to support production systems.

🎯 Requirements

• BS degree (or equivalent experience) in Computer Science or related technical field involving coding. • 12+ years of practical experience. • Hands-on experience with Kubernetes – deploying, configuring, and operating workloads and solutions on Kubernetes in production. • Experience building tools and services for storage, data, or platform infrastructure, with solid software design fundamentals (algorithms, data structures, complexity analysis) on large-scale Linux-based systems. • Experience building and operating telemetry and observability using tools such as Prometheus, InfluxDB, Grafana, and the Elastic stack. • Strong analytical troubleshooting skills with a systematic, root-cause-driven approach to identifying and resolving complex problems. • Proficiency in one or more of the following: Python, Go, or Java. • Good knowledge of infrastructure configuration management and infrastructure-as-code tools such as Ansible, Chef, Puppet, ArgoCD, Git Pipelines, and Terraform.

🏖️ Benefits

• Equity • Health insurance • Retirement plans • Paid time off • Professional development opportunities

Apply Now

Similar Jobs

🔥 1 hour ago

Pure Storage

1001 - 5000

🏢 Enterprise

Senior Pre-Sales Systems Engineer at Pure Storage driving technical solutions for the SLED market. Understanding customer needs, delivering technical presentations, and fostering revenue growth.

AWS

Azure

Cloud

Google Cloud Platform

Kubernetes

LAMP

Linux

Microservices

NFS

TCP/IP

VMware

🔥 3 hours ago

Datavant

201 - 500

⚕️ Healthcare Insurance

☁️ SaaS

🏢 Enterprise

Senior Systems Analyst supporting Oracle HCM technical initiatives at healthcare data collaboration platform Datavant. Focus on integrations, reporting, and system improvements with compliance adherence.

Cloud

Oracle

SOAP

SQL

🔥 8 hours ago

Pure Storage

1001 - 5000

🏢 Enterprise

Consulting Systems Engineer providing pre-sales leadership in enterprise data management solutions. Collaborating with cross-functional teams to deliver technical expertise throughout the customer journey.

AWS

Azure

Cloud

Cyber Security

Google Cloud Platform

Kubernetes

LAMP

Linux

Microservices

NFS

Python

TCP/IP

VMware

🕒 Yesterday

Seattle Children's

10,000+ employees

⚕️ Healthcare Insurance

🤝 Non-profit

💊 Pharmaceuticals

Consultant Architect managing Epic environment and integration technologies. Leading technical tasks and guiding implementation for application suites in healthcare sector.

🇺🇸 United States – Remote

💵 $143k - $214.5k / year

💰 $200k Grant on 2022-09

⏰ Full Time

🟠 Senior

🔴 Lead

⚙️ Systems Engineer

🕒 2 days ago

Chickasaw Nation Industries, Inc.

1001 - 5000

🏛️ Government

🤝 B2B

🚗 Transport

Sr. Aviation Systems Engineer Analyst providing critical engineering support to FAA modernization projects. Focus on safety studies, data collection, and technical reviews.