Capacity Operations Manager

🕒 March 20

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of NVIDIA

NVIDIA

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

📋 Description

• Coordinate the development of High Performance Computing (HPC) clusters, collaborating closely with internal and external engineering teams. • Direct and improve GPU capacity and additional compute resources across diverse cloud service platforms to satisfy rising needs and secure efficient deployment. • Design, improve, and manage data models, reporting platforms, data automation solutions, dashboards, and performance measures that back NVIDIA Infrastructure governance programs and strategic capacity decisions. • Assess the technical and business requirements for GPU capacity and other compute resources from different internal and external groups. • Identify performance bottlenecks in day-to-day usage of compute resources and collaborate with relevant infrastructure teams to resolve them. • Drive infrastructure resource efficiency initiatives in partnership with engineering, finance, and product teams. • Develop and enhance tooling for our cloud infrastructure and analytics platform to optimize resource usage and performance for NVIDIA and its customers. • This includes crafting and developing tools for automating workflows and potentially bringing to bear AI techniques to extract useful signals and insights from generated data. • Partner and cross-collaborate with Finance, Product, Service Owners, and Infrastructure Engineering teams to align cloud capacity management with company goals and develop Infrastructure and Service Level benchmarks to match Customer satisfaction.

🎯 Requirements

• Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field, or equivalent experience. • 8+ years of overall experience in cloud computing, specifically in managing or using GPU capacity for high performance computing. • A proven record of large-scale computing operations and planning is a plus. • Strong technical proficiency in cloud architecture, development and deployment, and managing large data sets. • Experience with command line interfaces and shell scripting languages. • Comprehensive knowledge of cloud service models (IaaS, PaaS, SaaS) and cloud infrastructure technologies. • Practical experience with Cloud Service Providers including AWS, Azure, GCP, and OCI is essential. • Demonstrated experience in bringing to bear AI tools and techniques to extract useful signals and insights from data, specifically to improve resource usage and automation. • Deep knowledge and active use of statistical modeling and machine learning approaches for boosting operational efficiency and supporting strategic capacity decisions. • Understanding of analytics, statistical modeling, and machine learning methodologies. • Strong communication and relationship-building skills, with the ability to work well across different departments and contribute to strategic decisions. • Self-starter, self-motivated, focused, and self-sufficient, with a willingness to learn new challenges and adapt quickly in a dynamic environment. • Ability to operate effectively amidst uncertainty and rapidly changing business conditions, with an agile approach and a commitment to ongoing improvement.

🏖️ Benefits

• equity • benefits

Apply Now

Similar Jobs

🕒 March 20

CRIO

51 - 200

⚕️ Healthcare Insurance

☁️ SaaS

💊 Pharmaceuticals

Enterprise Solutions Lead managing Central eSource sponsor programs at CRIO. Overseeing sponsor engagements and operational excellence in clinical research technology.

🕒 March 20

Bikeleasing-Service Deutschland

201 - 500

🤝 B2B

🚗 Transport

🧘 Wellness

Lead IT Operations Engineer managing cloud IT infrastructure in a digital leasing company. Focusing on standards, automation, and IT security within a complex environment.

🗣️🇩🇪 German Required

🕒 March 20

Operations Assistant providing administrative assistance to the Operations Team at GREEN AND SPIEGEL US LLC. Ensuring smooth workflows through detail-oriented support and task management.

🕒 March 20

Decision Resources

51 - 200

🤝 B2B

🏢 Enterprise

☁️ SaaS

Business Consultant providing consulting, training, and support for ERP software in manufacturing operations. Passionate about client service and experienced in Infor's CloudSuite Industrial SyteLine modules.

🕒 March 20

Invisible Technologies

201 - 500

🤖 Artificial Intelligence

☁️ SaaS

🏢 Enterprise

Implementation Operations Agent working with the Implementation team to operationally set up plans. Ensuring platforms run smoothly and efficiently with focus on client needs.

🇺🇸 United States – Remote

💵 $5 / hour

🔥 Funding within the last year

💰 $100M Series unknown on 2025-10

⏰ Full Time

🟡 Mid-level

🟠 Senior

⚙️ Operations