Solutions Engineer, AI/HPC Infrastructure

August 29

Apply Now
Logo of DriveNets

DriveNets

DriveNets is a rapidly growing software company that has created a radical new way for service providers and hyperscalers to build their networking infrastructure. DriveNets Network Cloud and DriveNets Network Cloud-AI are new innovative networking solutions that apply the cloud architectural approach to high-scale networking. They bring together the scalability of standard Ethernet Clos architecture with the high performance and reliability of service provider networking, delivering optimal networking performance, scale and cost structure for service providers and hyperscalers.

201 - 500 employees

📋 Description

• Building robust AI/HPC infrastructure for new and existing customers. • Technical hands-on role in building and supporting NVIDIA/AMD based platforms. • Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, training stability, real-time monitoring, logging, and alerting. • Administer Linux systems, ranging from powerful GPU-enabled servers to general-purpose compute systems. • Design and plan rack layouts and network topologies to support customer requirements. • Design and evaluate automation scripts for network operations, configuring server and switch fabrics. • Perform NCCL, RCCL, LLM, and RDMA performance benchmarks as part of the design and evaluation process of the deployment. • Benchmark the latest GPU compute and NIC solutions by all major compute vendors, over the DriveNets networking fabric • Install and configure Drivenets products, ensuring optimal performance and customer satisfaction. • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. • Engage in and improve the whole lifecycle of services from inception and design through deployment, operation, and refinement. • Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements. • Introduce new products to the Drivenets’ sales and support teams and to Drivenets’ customers Deliver technical trainings and TOIs for support/sales engineers, partners, and customers • Collaborate on product definition through customer requirement gathering and roadmap planning.

🎯 Requirements

• 5+ years of previous experience deploying and administering AI/HPC clusters or general-purpose compute systems. • 5+ years of hands-on Linux experience (e.g., RHEL, CentOS, Ubuntu) and production infrastructure support (e.g., networking, storage, monitoring, compute, installation, configuration, maintenance, upgrade, retirement) • Proficiency in Cloud, Virtualization, and Container technologies. • Deep understanding of operating systems, computer networks, and high-performance applications • Hands-on experience with Bash, Python, and configuration management tools (e.g., Ansible). • Established record of leading technical initiatives and delivering results. • Ability to write extensive technical content (white papers, technical briefs, test reports, etc.) for external audiences with a balance of technical accuracy, strategy, and clear messaging • Ability to travel domestic and international

🏖️ Benefits

• No explicit benefits listed

Apply Now

Similar Jobs

August 29

CompQsoft

501 - 1000

🏛️ Government

🔒 Cybersecurity

Architect D365 CE solutions; collaborates with stakeholders, leads integrations, and guides frontend work with HTML/JavaScript and Power Platform.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

🔴 Lead

💻 Solutions Engineer

August 29

Canadian Solar Inc.

10,000+ employees

⚡ Energy

Technical Solutions Manager leading inverter technology for Canadian Solar's utility-scale BESS products. Drive inverter due diligence, supplier qualification, integration, and cross-functional product development.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

💻 Solutions Engineer

August 29

Sapiens Development

51 - 200

🤝 B2B

☁️ SaaS

Marketplace Solution Architect leading multi-vendor eCommerce integrations; remote role at Sapiens Development with focus on architecture, performance, and compliance.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

💻 Solutions Engineer

August 28

NVIDIA

10,000+ employees

🤖 Artificial Intelligence

🎮 Gaming

Partner with NVIDIA teams to drive design wins in finance; deploy ML/DL workloads on GPUs and profile/optimize models for production.

🇺🇸 United States – Remote

💵 $224k - $356.5k / year

⏰ Full Time

🟠 Senior

💻 Solutions Engineer

🦅 H1B Visa Sponsor

August 28

NVIDIA

10,000+ employees

🤖 Artificial Intelligence

🎮 Gaming

Partner with NVIDIA teams to secure design wins at customers | Enable rapid ML/DL optimization for finance workloads.

🇺🇸 United States – Remote

💵 $224k - $356.5k / year

⏰ Full Time

🟠 Senior

🔴 Lead

💻 Solutions Engineer

🦅 H1B Visa Sponsor

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com