Super Intelligence HPC Support Engineer

Job not on LinkedIn

October 13

Apply Now
Logo of Lambda

Lambda

Artificial Intelligence • SaaS • Hardware

Lambda is a company that provides cloud-based solutions and hardware for AI development. They offer on-demand GPU clusters for multi-node training and fine-tuning, as well as inference endpoints and APIs. Their products include the Lambda GPU Cloud, which features NVIDIA's latest generation of infrastructure for enterprise AI, and customizable GPU workstations and desktops designed for AI and deep learning. Lambda also offers a one-line installation and managed upgrade path for machine learning tools like PyTorch, TensorFlow, and NVIDIA CUDA. By focusing on enabling AI developers, Lambda provides both public and private cloud services with access to powerful NVIDIA Tensor Core GPUs.

51 - 200 employees

🤖 Artificial Intelligence

☁️ SaaS

🔧 Hardware

💰 $39.7M Venture Round on 2022-11

📋 Description

• Act as the primary technical point of escalation for Super Intelligence customers running hyperscale GPU clusters. • Lead incident response for complex issues, ensuring rapid triage, clear communication, and timely resolution. • Proactively identify risks in large environments (firmware, performance bottlenecks, orchestration issues) and drive preventative improvements. • Partner closely with Lambda Engineering and Product teams to influence roadmap decisions based on real customer needs. • Contribute to runbooks, best practices, and operational guides tailored for hyperscale environments. • Train and mentor other support engineers, raising the bar across Lambda’s support organization. • Participate in a rotating on-call schedule, owning critical incidents and high-priority alerts for SI customers.

🎯 Requirements

• 7+ years of experience in HPC or cloud support engineering, with customer-facing responsibilities. • Proven experience managing large-scale Linux clusters and distributed HPC/AI workloads. • Deep expertise in orchestration tools such as Kubernetes and/or Slurm. • Strong knowledge of GPU technologies (CUDA, NCCL, MIG, NVLink, GPUDirect RDMA). • Skilled in high-throughput networking (InfiniBand, RoCE) and cluster storage solutions. • Familiarity with monitoring/logging platforms (Prometheus, Grafana, Datadog). • Experience leading incident management and communicating directly with enterprise or hyperscale customers. • Ability to balance deep technical troubleshooting with clear, concise communication to executives and stakeholders.

🏖️ Benefits

• Health, dental, and vision coverage for you and your dependents • Wellness and Commuter stipends for select roles • 401k Plan with 2% company match (USA employees) • Flexible Paid Time Off Plan that we all actually use

Apply Now

Similar Jobs

October 13

Snowflake

5001 - 10000

☁️ SaaS

Designated Support Engineer providing expert advice for optimal use of Snowflake's AI Data Cloud. Collaborating with customers to enhance their experience in the Snowflake platform.

🇺🇸 United States – Remote

💵 $135k - $189k / year

⏰ Full Time

🟠 Senior

🔴 Lead

📞 Support Engineer

🦅 H1B Visa Sponsor

AWS

Azure

Cloud

ETL

Informatica

Matillion

RDBMS

SQL

Tableau

October 11

Sole Hire

1 - 10

🎯 Recruiter

🤝 B2B

Help Desk Technician III for a Managed Service Provider. Resolving complex technical issues and ensuring optimal system performance in remote technical support.

🇺🇸 United States – Remote

💵 $70k - $80k / year

⏰ Full Time

🟠 Senior

🔴 Lead

📞 Support Engineer

October 10

Imply

51 - 200

Senior Technical Support Engineer resolving customer issues on Imply Druid platform with a focus on observability and innovative solutions. Collaborating with multiple teams to provide excellent customer support.

🇺🇸 United States – Remote

💵 $115k - $155k / year

⏰ Full Time

🟠 Senior

📞 Support Engineer

🦅 H1B Visa Sponsor

October 10

STCR

51 - 200

🛒 Retail

🤝 B2B

🛍️ eCommerce

Technical Support Agent L1 serving as first contact for customer technical issues and troubleshooting POS systems. Managing support tickets and ensuring customer satisfaction through strong problem-solving skills.

🇺🇸 United States – Remote

💵 $21 - $23 / hour

⏰ Full Time

🟡 Mid-level

🟠 Senior

📞 Support Engineer

🗣️🇪🇸 Spanish Required

October 10

IDEX Corporation

5001 - 10000

🔬 Science

⚕️ Healthcare Insurance

🚗 Transport

Technical Support Representative assisting customers with troubleshooting IDEX Fire and Safety products via phone, email, and in person. Collaborating with engineering and manufacturing teams to provide customer satisfaction.

🇺🇸 United States – Remote

💵 $44.5k - $66.7k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

📞 Support Engineer

ERP

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com