AI Infrastructure Deployment Lead

Artificial Intelligence • SaaS • Hardware

Lambda is a company that provides cloud-based solutions and hardware for AI development. They offer on-demand GPU clusters for multi-node training and fine-tuning, as well as inference endpoints and APIs. Their products include the Lambda GPU Cloud, which features NVIDIA's latest generation of infrastructure for enterprise AI, and customizable GPU workstations and desktops designed for AI and deep learning. Lambda also offers a one-line installation and managed upgrade path for machine learning tools like PyTorch, TensorFlow, and NVIDIA CUDA. By focusing on enabling AI developers, Lambda provides both public and private cloud services with access to powerful NVIDIA Tensor Core GPUs.

51 - 200 employees

🤖 Artificial Intelligence

☁️ SaaS

🔧 Hardware

💰 $39.7M Venture Round on 2022-11

AI Infrastructure Deployment Lead

Job not on LinkedIn

November 5

🇺🇸 United States – Remote

💵 $128k - $149k / year

⏰ Full Time

🟠 Senior

🗣️ LLM Engineer

🦅 H1B Visa Sponsor

Cloud

PMP

Apply Now

Lambda

Artificial Intelligence • SaaS • Hardware

51 - 200 employees

🤖 Artificial Intelligence

☁️ SaaS

🔧 Hardware

💰 $39.7M Venture Round on 2022-11

📋 Description

• Lead end-to-end deployment of GPU clusters, storage systems, and networking fabric across Lambda’s data centers. • Design and implement data center network topologies optimized for AI and HPC workloads, including high-speed Ethernet and InfiniBand environments. • Oversee rack implementation, cabling, and power/cooling validation for optimal efficiency and scalability. • Collaborate with supply chain, logistics, and operations teams to ensure smooth delivery and installation timelines. • Implement Layer 2/Layer 3 networks, including VLANs, Spine to Leaf architecture, Infiniband interconnect technology. • Partner with network architects to ensure redundancy, scalability, and low-latency interconnects for distributed AI workloads. • Monitor network health, identify bottlenecks, and implement optimizations to maintain peak performance. • Oversee server hardware troubleshooting, including GPUs, NICs, CPUs, and storage components. • Lead root-cause analysis for system issues and drive corrective actions in collaboration with vendors and internal hardware teams. • Develop standard operating procedures (SOPs) for hardware validation, deployment, and maintenance. • Serve as technical project lead for infrastructure rollouts and cluster expansion projects. • Coordinate cross-functional teams — networking, facilities, cloud operations, and hardware engineering — to execute deployments on schedule. • Manage project scope, budgets, risk assessments, and post-deployment reviews. • Communicate status, challenges, and milestones to leadership with clarity and precision. • Maintain detailed network topology diagrams, deployment runbooks, and hardware inventories. • Identify opportunities for process automation and infrastructure standardization across deployments. • Contribute to Lambda’s internal knowledge base and mentor junior engineers on data center best practices.

🎯 Requirements

• Bachelor’s degree in Computer Engineering, Information Technology, or related field. • CCNA (Cisco Certified Network Associate) certification (CCNP or equivalent a plus). • PMP (Project Management Professional) Certification (PMP or equivalent a plus). • 5+ years of experience in data center infrastructure deployment or network operations, preferably in AI, HPC, or cloud environments. • Proven ability to lead complex technical projects and manage multidisciplinary teams. • Strong understanding of data center network design (Layer 2/3, VLAN, Rack elevations, port mapping, Infiniband technologies). • Hands-on expertise in server hardware troubleshooting and rack-level integration.

🏖️ Benefits

• Health, dental, and vision coverage for you and your dependents • Wellness and Commuter stipends for select roles • 401k Plan with 2% company match (USA employees) • Flexible Paid Time Off Plan that we all actually use

Apply Now

Similar Jobs

Senior Generative AI Engineer

October 21

Liftoff Mobile

501 - 1000

Senior Generative AI Engineer at Liftoff architecting AI-powered solutions for advertising technology. Pioneering intelligent agents to transform workflows across various core functions.

🇺🇸 United States – Remote

💵 $135k - $227k / year

⏰ Full Time

🟠 Senior

🗣️ LLM Engineer

🦅 H1B Visa Sponsor

Python

Senior Large Language Model (LLM) Operations Engineer

October 21

N-Power Medicine, Inc.

11 - 50

🧬 Biotechnology

⚕️ Healthcare Insurance

💊 Pharmaceuticals

Senior LLM Operations Engineer at N-Power Medicine. Responsible for scaling AI innovation in clinical variable abstraction and note generation through infrastructure and system automation.

🇺🇸 United States – Remote

💵 $165k - $205k / year

⏰ Full Time

🟠 Senior

🗣️ LLM Engineer

AWS

Azure

Cloud

Docker

Google Cloud Platform

Jenkins

Kubernetes

Python

AI/LLM Engineer

October 9

Trellis

51 - 200

🛍️ eCommerce

🤝 B2B

☁️ SaaS

AI/ML Engineer working on backend services and data analytics for Trellis, a legal data company. Designing data architecture and features for high-speed, large data environments.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

🔴 Lead

🗣️ LLM Engineer

🦅 H1B Visa Sponsor

Python

Senior Software Engineer, Generative AI

October 8

BPK Technologies

51 - 200

🤝 B2B

🏢 Enterprise

🤖 Artificial Intelligence

Senior Software Engineer developing Generative AI solutions for Veltris. Leading software development life cycle and driving innovation across products.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

🗣️ LLM Engineer

AWS

Azure

Cloud

Distributed Systems

Google Cloud Platform

SDLC

Technical Product Manager, AI Infrastructure and Platform

September 20

MECA GROUP, INC.

11 - 50

Lead AI infrastructure, data pipelines, governance, and platform APIs at SingleFile compliance SaaS. Drive roadmap, architect decisions, and cross-functional delivery.

🇺🇸 United States – Remote

💵 $95k - $125k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

🗣️ LLM Engineer

Cloud