Senior Storage Production Engineer – DGX Cloud

🔥 9 minutes ago

🏄 California – Remote

info

💵 $176k - $333.5k / year

⏰ Full Time

🟠 Senior

🏭 Production Engineer

🦅 H1B Visa Sponsor

info
Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of NVIDIA

NVIDIA

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

📋 Description

• Design, implement, and support large-scale storage clusters, ensuring scalability, high availability, and data integrity. • Develop and maintain storage monitoring, logging, and alerting systems to ensure proactive detection and resolution of performance issues. • Work with AI/ML workloads to improve storage architectures for low-latency access, efficient caching, and high-throughput performance. • Improve the lifecycle of storage services – from inception and design to deployment, operation, and continuous optimization. • Support storage services before they become available through activities such as system build consulting, developing automation frameworks, capacity management, and launch reviews. • Maintain production storage infrastructure by supervising availability, latency, and system health, leveraging predictive analytics and AI-driven automation. • Optimize storage efficiency through compression, deduplication, tiering strategies, and intelligent workload placement. • Scale storage systems sustainably using AI/ML-driven automation, policy-based tiering, and dynamic data migration techniques. • Ensure data security and compliance by implementing encryption, access controls, and auditing mechanisms for storage systems. • Practice sustainable incident response and blameless root cause analysis. • Be part of an on-call rotation to support storage and production systems.

🎯 Requirements

• BS degree or equivalent experience in Computer Science, Storage Systems, or a related technical field with 8+ years of practical experience. • Experience with distributed and high-performance storage solutions, including clustered and parallel file systems, distributed object storage, and enterprise-grade storage systems. • Solid understanding of block, file, and object storage technologies, including their scalability, reliability, and performance characteristics and standard processes. • Experience with storage networking protocols such as NFS, SMB, iSCSI, S3, Fibre Channel, RDMA, and NVMe over Fabrics. • Expertise in algorithms, data structures, complexity analysis, software design, and automating maintenance of large-scale Linux-based storage systems. • Experience in one or more of the following: C/C++, Java, Python, Go, NodeJS, and Bash for storage automation, monitoring, and performance tuning. • Hands-on experience with infrastructure configuration management tools like Ansible, Chef, Puppet, and Terraform for automating storage deployments. • Experience with observability and tracing tools like InfluxDB, Prometheus, Grafana, and the Elastic stack for monitoring storage system health.

🏖️ Benefits

• equity • benefits

Apply Now

Similar Jobs

🕒 3 days ago

Pindrop

201 - 500

🔒 Cybersecurity

📡 Telecommunications

💸 Finance

Senior Production Support Engineer ensuring reliability and performance of Pindrop’s platform. Resolving complex production issues and leading incident response while collaborating with engineering and customer success teams.

🕒 3 days ago

ProSidian Consulting

11 - 50

⚡ Energy

🏢 Enterprise

Production Engineer providing technical due diligence and engineering validation for upstream oil and gas projects. Role involves coordinating with various stakeholders and delivering independent engineering advisory services.

🕒 June 5

Cordial

51 - 200

🤝 B2B

☁️ SaaS

🤖 Artificial Intelligence

Data Scientist focusing on operationalizing, optimizing, and scaling data science models at Cordial. Collaborating with teams to enhance performance and efficiency.

🕒 May 14

Crown

51 - 200

🤝 B2B

⚡ Energy

🧬 Biotechnology

Sr. Manager Production Engineering focusing on electrical programming and problem-solving in food can manufacturing. Extensive travel and collaboration with manufacturing facilities in the US.