Solutions Architect, DevOps

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of NVIDIA

NVIDIA

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

📋 Description

• Advise on and help maintain large-scale computational and AI infrastructure, including monitoring, logging, and workload orchestration (Kubernetes and Linux job schedulers). • Provide consultative guidance and perform hands-on troubleshooting across the full stack—from bare metal and operating system, through the software stack, container platform, networking, and storage. • Assess customer environments and recommend optimized, production-ready Kubernetes-based container platforms integrated with enterprise-grade networking and storage solutions. • Serve as a key technical resource: develop, refine, and document standard methodologies and operational guidelines to be shared with internal teams and customer stakeholders. • Support Development activities and engage in POCs/POVs to validate new features, architectures, and upgrade approaches. • Create and deliver high-quality documentation, including runbooks, onboarding materials, and best-practice guides for customers and internal teams. • Act as the technical leader for assigned customer accounts, providing strategic guidance on DevOps and platform architecture and influencing long-term infrastructure and operations decisions.

🎯 Requirements

• BS/MS/PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or related fields • 5+ years of professional experience in managing scalable cloud environments and automation engineering roles. • Proven understanding of networking fundamentals, data center architectures, and hands-on experience managing HPC/AI clusters, including deployment, optimization and troubleshooting. • Demonstrated hands-on experience deploying, configuring, and optimizing NVIDIA GPU-accelerated infrastructure, including driver management, CUDA toolkit integration, and GPU workload profiling. • Extensive experience with Kubernetes for container orchestration, resource scheduling, scaling, and integration with GPU-accelerated and HPC environments. • Strong familiarity with HPC and AI technologies (CPUs, GPUs, high-speed interconnects) and supporting software stacks. • Deep knowledge of Linux (RedHat, Ubuntu), OS-level security, and protocols. • Proficiency in Python and Bash scripting, configuration management, and Infrastructure-as-Code tools (e.g., Ansible, Terraform). • Experience with observability stacks (Grafana, Loki, Prometheus) for monitoring, logging, and building fault-tolerant systems. • Strong background in crafting scalable solutions and providing consultative support to customers, including leading architectural reviews and presenting to executive stakeholders.

Apply Now

Similar Jobs

🕒 May 29

Intetics

501 - 1000

🤖 Artificial Intelligence

🏢 Enterprise

SAP IS-U Solution Architect designing and implementing application solutions at Intetics Inc. Collaborating with stakeholders to fulfill business requirements and digital strategy.

🕒 May 29

Akamai Technologies

5001 - 10000

🔒 Cybersecurity

Senior Solutions Architect at Akamai, focusing on API Security solutions for enterprise clients. Involves hands-on deployment and technical architecture responsibilities.

AWS

Azure

Citrix

Cloud

DNS

Docker

Google Cloud Platform

Kubernetes

Linux

NGINX

Oracle

TCP/IP

VMware

🕒 May 27

Dropbox

1001 - 5000

🏢 Enterprise

⚡ Productivity

Systems Automation & Integration Engineer designing and supporting real-time integrations using platforms at Dropbox. Responsible for automation and data flow across ERP, CRM, and HCM systems.

Groovy

Java

JavaScript

Oracle

ServiceNow

SOAP

SQL

🕒 May 25

DoiT International

201 - 500

☁️ SaaS

Solutions Engineer optimizing Snowflake/Databricks for DoiT's customers in EMEA. Collaborating with Account Executives and delivering compelling technical presentations.

BigQuery

Cloud

🕒 May 22

Sigma Software Group

1001 - 5000

🎮 Gaming

📡 Telecommunications

AI Solution Architect at Sigma Software responsible for scalable technical solutions. Focused on AI-powered voice and conversational solutions with cloud-native integration.

AWS

Azure

Cloud

Docker

Google Cloud Platform

Kubernetes

Microservices

Python

PyTorch

Tensorflow