AI Infrastructure Deployment Lead

Job not on LinkedIn

November 5

Apply Now
Logo of Lambda

Lambda

Artificial Intelligence • SaaS • Hardware

Lambda is a company that provides cloud-based solutions and hardware for AI development. They offer on-demand GPU clusters for multi-node training and fine-tuning, as well as inference endpoints and APIs. Their products include the Lambda GPU Cloud, which features NVIDIA's latest generation of infrastructure for enterprise AI, and customizable GPU workstations and desktops designed for AI and deep learning. Lambda also offers a one-line installation and managed upgrade path for machine learning tools like PyTorch, TensorFlow, and NVIDIA CUDA. By focusing on enabling AI developers, Lambda provides both public and private cloud services with access to powerful NVIDIA Tensor Core GPUs.

51 - 200 employees

🤖 Artificial Intelligence

☁️ SaaS

🔧 Hardware

💰 $39.7M Venture Round on 2022-11

📋 Description

• Lead end-to-end deployment of GPU clusters, storage systems, and networking fabric across Lambda’s data centers. • Design and implement data center network topologies optimized for AI and HPC workloads, including high-speed Ethernet and InfiniBand environments. • Oversee rack implementation, cabling, and power/cooling validation for optimal efficiency and scalability. • Collaborate with supply chain, logistics, and operations teams to ensure smooth delivery and installation timelines. • Implement Layer 2/Layer 3 networks, including VLANs, Spine to Leaf architecture, Infiniband interconnect technology. • Partner with network architects to ensure redundancy, scalability, and low-latency interconnects for distributed AI workloads. • Monitor network health, identify bottlenecks, and implement optimizations to maintain peak performance. • Oversee server hardware troubleshooting, including GPUs, NICs, CPUs, and storage components. • Lead root-cause analysis for system issues and drive corrective actions in collaboration with vendors and internal hardware teams. • Develop standard operating procedures (SOPs) for hardware validation, deployment, and maintenance. • Serve as technical project lead for infrastructure rollouts and cluster expansion projects. • Coordinate cross-functional teams — networking, facilities, cloud operations, and hardware engineering — to execute deployments on schedule. • Manage project scope, budgets, risk assessments, and post-deployment reviews. • Communicate status, challenges, and milestones to leadership with clarity and precision. • Maintain detailed network topology diagrams, deployment runbooks, and hardware inventories. • Identify opportunities for process automation and infrastructure standardization across deployments. • Contribute to Lambda’s internal knowledge base and mentor junior engineers on data center best practices.

🎯 Requirements

• Bachelor’s degree in Computer Engineering, Information Technology, or related field. • CCNA (Cisco Certified Network Associate) certification (CCNP or equivalent a plus). • PMP (Project Management Professional) Certification (PMP or equivalent a plus). • 5+ years of experience in data center infrastructure deployment or network operations, preferably in AI, HPC, or cloud environments. • Proven ability to lead complex technical projects and manage multidisciplinary teams. • Strong understanding of data center network design (Layer 2/3, VLAN, Rack elevations, port mapping, Infiniband technologies). • Hands-on expertise in server hardware troubleshooting and rack-level integration.

🏖️ Benefits

• Health, dental, and vision coverage for you and your dependents • Wellness and Commuter stipends for select roles • 401k Plan with 2% company match (USA employees) • Flexible Paid Time Off Plan that we all actually use

Apply Now

Similar Jobs

October 21

Liftoff Mobile

501 - 1000

Senior Generative AI Engineer at Liftoff architecting AI-powered solutions for advertising technology. Pioneering intelligent agents to transform workflows across various core functions.

🇺🇸 United States – Remote

💵 $135k - $227k / year

⏰ Full Time

🟠 Senior

🗣️ LLM Engineer

🦅 H1B Visa Sponsor

October 21

N-Power Medicine, Inc.

11 - 50

🧬 Biotechnology

⚕️ Healthcare Insurance

💊 Pharmaceuticals

Senior LLM Operations Engineer at N-Power Medicine. Responsible for scaling AI innovation in clinical variable abstraction and note generation through infrastructure and system automation.

🇺🇸 United States – Remote

💵 $165k - $205k / year

⏰ Full Time

🟠 Senior

🗣️ LLM Engineer

October 9

Trellis

51 - 200

🛍️ eCommerce

🤝 B2B

☁️ SaaS

AI/ML Engineer working on backend services and data analytics for Trellis, a legal data company. Designing data architecture and features for high-speed, large data environments.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

🔴 Lead

🗣️ LLM Engineer

🦅 H1B Visa Sponsor

October 8

BPK Technologies

51 - 200

🤝 B2B

🏢 Enterprise

🤖 Artificial Intelligence

Senior Software Engineer developing Generative AI solutions for Veltris. Leading software development life cycle and driving innovation across products.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

🗣️ LLM Engineer

September 20

Lead AI infrastructure, data pipelines, governance, and platform APIs at SingleFile compliance SaaS. Drive roadmap, architect decisions, and cross-functional delivery.

🇺🇸 United States – Remote

💵 $95k - $125k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

🗣️ LLM Engineer

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com