Senior Systems Engineer, Storage – DGX Cloud

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of NVIDIA

NVIDIA

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

📋 Description

• Design, deploy, and operate solutions on Kubernetes for large-scale storage and data platforms, including the manifests, Helm charts, and operators that run them. • Build tools, services, and automation that improve the lifecycle of storage and data systems – from provisioning and configuration through deployment, scaling, and day-2 operations. • Develop and operate telemetry and observability for production systems – metrics, logging, tracing, dashboards, and alerting – so that system health, availability, and latency are measurable and actionable. • Apply strong analytical troubleshooting skills to diagnose and resolve complex issues across distributed, containerized infrastructure. • Work closely with peers and partner teams to improve the lifecycle of services, from inception and design through deployment, operation, and refinement. • Scale systems sustainably through automation, infrastructure-as-code, and CI/CD, and evolve systems by pushing for changes that improve reliability and velocity. • Support services before they go live through activities such as deployment automation, capacity planning, and launch and readiness reviews. • Practice sustainable incident response and postmortems, and participate in an on-call rotation to support production systems.

🎯 Requirements

• BS degree (or equivalent experience) in Computer Science or related technical field involving coding. • 12+ years of practical experience. • Hands-on experience with Kubernetes – deploying, configuring, and operating workloads and solutions on Kubernetes in production. • Experience building tools and services for storage, data, or platform infrastructure, with solid software design fundamentals (algorithms, data structures, complexity analysis) on large-scale Linux-based systems. • Experience building and operating telemetry and observability using tools such as Prometheus, InfluxDB, Grafana, and the Elastic stack. • Strong analytical troubleshooting skills with a systematic, root-cause-driven approach to identifying and resolving complex problems. • Proficiency in one or more of the following: Python, Go, or Java. • Good knowledge of infrastructure configuration management and infrastructure-as-code tools such as Ansible, Chef, Puppet, ArgoCD, Git Pipelines, and Terraform.

🏖️ Benefits

• Equity • Health insurance • Retirement plans • Paid time off • Professional development opportunities

Apply Now

Similar Jobs

🔥 1 hour ago

Pure Storage

1001 - 5000

🏢 Enterprise

Senior Pre-Sales Systems Engineer at Pure Storage driving technical solutions for the SLED market. Understanding customer needs, delivering technical presentations, and fostering revenue growth.

🔥 3 hours ago

Datavant

201 - 500

⚕️ Healthcare Insurance

☁️ SaaS

🏢 Enterprise

Senior Systems Analyst supporting Oracle HCM technical initiatives at healthcare data collaboration platform Datavant. Focus on integrations, reporting, and system improvements with compliance adherence.

🔥 8 hours ago

Pure Storage

1001 - 5000

🏢 Enterprise

Consulting Systems Engineer providing pre-sales leadership in enterprise data management solutions. Collaborating with cross-functional teams to deliver technical expertise throughout the customer journey.

🕒 Yesterday

Seattle Children's

10,000+ employees

⚕️ Healthcare Insurance

🤝 Non-profit

💊 Pharmaceuticals

Consultant Architect managing Epic environment and integration technologies. Leading technical tasks and guiding implementation for application suites in healthcare sector.

🇺🇸 United States – Remote

💵 $143k - $214.5k / year

💰 $200k Grant on 2022-09

⏰ Full Time

🟠 Senior

🔴 Lead

⚙️ Systems Engineer

🕒 2 days ago

Chickasaw Nation Industries, Inc.

1001 - 5000

🏛️ Government

🤝 B2B

🚗 Transport

Sr. Aviation Systems Engineer Analyst providing critical engineering support to FAA modernization projects. Focus on safety studies, data collection, and technical reviews.