Senior Solutions Architect – Infiniband, Networking, Ethernet

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of NVIDIA

NVIDIA

10,000+ employees

Founded 1993

🤖 Artificial Intelligence

🎮 Gaming

Artificial Intelligence • Gaming • Automotive

NVIDIA is a leading technology company specializing in accelerated computing and artificial intelligence. NVIDIA pioneers advancements in graphical processing units (GPUs), cloud computing, data centers, and virtual reality, with a focus on gaming, automotive, healthcare, and robotics industries. The company's innovations, such as NVIDIA Omniverse, transform traditional digital processes by enabling high-fidelity simulations and rendering tasks. Their applications span various industries, from autonomous vehicles using NVIDIA DRIVE to healthcare solutions with NVIDIA Clara, and AI-driven analytics and workflows.

📋 Description

• Primary responsibilities will include building AI/HPC infrastructure for new and existing customers. • Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, real-time monitoring, logging, and alerting. • Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation, and refinement. • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. • Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.

🎯 Requirements

• BS/MS/PhD or equivalent experience in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or related fields. • At least 5+ years of professional experience in networking fundamentals, Ethernet or InfiniBand World. • Hands-on experience with network switch/router platforms like Cumulus Linux, SONiC, IOS, JunosOS, and EOS, etc. • Possess solid working knowledge of Ethernet/InfiniBand/RDMA core principles. • Be proficient in end-to-end IB/Eth cluster deployment, adapter configuration and firmware maintenance, and able to conduct professional performance benchmarking with mainstream RDMA testing tools. • Capable of independently diagnosing and troubleshooting typical IB/Eth network anomalies, including link flapping, connection failure, as well as bandwidth and latency jitter issues. • Master practical RDMA network optimization strategies such as QP tuning, MTU configuration and congestion control optimization. • Hands-on working experience in RDMA-accelerated business scenarios, including distributed storage and high-performance computing clusters. • Extensive experience delivering automated network provisioning solutions using tools like Ansible, Salt, and Python. • Ability to develop CI/CD pipelines for network operations. • Strong written, verbal, and listening skills in English are essential.

🏖️ Benefits

• NVIDIA pioneered accelerated computing. • Our AI infrastructure powers global intelligence, transforming every industry.

Apply Now

Similar Jobs

🔥 3 hours ago

Cloudera

1001 - 5000

🏢 Enterprise

☁️ SaaS

🤖 Artificial Intelligence

Partner Solutions Engineer bridging Cloudera’s cutting-edge technology with global partner ecosystem. Driving partnership proficiency and solving complex technical hurdles while maximizing value of Cloudera Data Platform.

🇮🇳 India – Remote

💰 $4.1M Venture Round on 2013-01

⏰ Full Time

🟡 Mid-level

🟠 Senior

💻 Solutions Engineer

Linux

Spark

🕒 Yesterday

phData

201 - 500

🤖 Artificial Intelligence

☁️ SaaS

🏢 Enterprise

Solutions Architect leading design and delivery of cloud data analytics solutions for enterprise customers at phData. Collaborating with teams and guiding implementation for high-quality outcomes.

Airflow

AWS

Azure

Cassandra

Cloud

ElasticSearch

Google Cloud Platform

HDFS

Informatica

Java

Kafka

Matillion

NoSQL

Python

Scala

Spark

SQL

🕒 Yesterday

Netomi

51 - 200

🤖 Artificial Intelligence

🏢 Enterprise

☁️ SaaS

Agentic Solution Engineer building and scaling workflows with Netomi’s no-code platform for AI. Collaborating with teams to deliver autonomous AI solutions for enterprise customer experience.

🕒 3 days ago

Kyndryl

10,000+ employees

🏢 Enterprise

🔒 Cybersecurity

☁️ SaaS

Solution Architect defining Power BI architecture and governance, performing tenant-to-tenant migration for Kyndryl. Establishing runbooks and coordinating dependencies while ensuring operational reporting.

🕒 June 5

Epicor

1001 - 5000

🏢 Enterprise

🛒 Retail

Solution Engineer, Sr providing technical expertise and support for ERP sales opportunities at Epicor. Coaching Solution Engineers and conducting technical presentations for clients.

ERP