
51 - 200 employees
🤖 Artificial Intelligence
☁️ SaaS
🔧 Hardware
💰 $39.7M Venture Round on 2022-11
Artificial Intelligence • SaaS • Hardware
Lambda is a company that provides cloud-based solutions and hardware for AI development. They offer on-demand GPU clusters for multi-node training and fine-tuning, as well as inference endpoints and APIs. Their products include the Lambda GPU Cloud, which features NVIDIA's latest generation of infrastructure for enterprise AI, and customizable GPU workstations and desktops designed for AI and deep learning. Lambda also offers a one-line installation and managed upgrade path for machine learning tools like PyTorch, TensorFlow, and NVIDIA CUDA. By focusing on enabling AI developers, Lambda provides both public and private cloud services with access to powerful NVIDIA Tensor Core GPUs.
🔥 0 minutes ago
🏄 California – Remote
💵 $185k - $290k / year
⏰ Full Time
🟠 Senior
🔴 Lead
👷🏻♀️ Engineer
🦅 H1B Visa Sponsor
Improve your chances of getting an interview by checking your resume score before you apply.

51 - 200 employees
🤖 Artificial Intelligence
☁️ SaaS
🔧 Hardware
💰 $39.7M Venture Round on 2022-11
Artificial Intelligence • SaaS • Hardware
Lambda is a company that provides cloud-based solutions and hardware for AI development. They offer on-demand GPU clusters for multi-node training and fine-tuning, as well as inference endpoints and APIs. Their products include the Lambda GPU Cloud, which features NVIDIA's latest generation of infrastructure for enterprise AI, and customizable GPU workstations and desktops designed for AI and deep learning. Lambda also offers a one-line installation and managed upgrade path for machine learning tools like PyTorch, TensorFlow, and NVIDIA CUDA. By focusing on enabling AI developers, Lambda provides both public and private cloud services with access to powerful NVIDIA Tensor Core GPUs.
• Architect and manage BMS integration across colocation and Lambda-owned facilities, covering chillers, CRAHs, CDUs (Coolant Distribution Units), cooling towers, UPS systems, PDUs, and automatic transfer switches. • Define standards for BMS point lists, naming conventions, control sequences, and integration protocols (BACnet, Modbus, SNMP, OPC-UA, RESTful APIs). • Oversee commissioning and acceptance testing of new BMS deployments and CDU/TCS loop integrations for next-generation liquid-cooled GPU rack systems. • Collaborate with colocation partners (Equinix, Digital Realty, and others) to ensure telemetry data flows from provider BMS/EPMS into Lambda's monitoring stack. • Own the DCIM platform strategy and roadmap — evaluating, selecting, and implementing tooling for asset management, capacity planning, environmental monitoring, and power chain visibility. • Develop and maintain real-time dashboards for PUE, thermal performance, stranded capacity, and cooling system efficiency across all Lambda sites. • Build and maintain telemetry pipelines ingesting data from BMS, PDUs, in-rack sensors, CDUs, and network devices into centralized monitoring and alerting platforms (e.g., Prometheus, Grafana, InfluxDB, or equivalent). • Define alarm thresholds and escalation workflows for critical facility events including high coolant temperatures, CDU inlet/outlet anomalies, leak detection, and power exceedances. • Develop control strategies and setpoint frameworks for TCS (Thermal Control System) loops supporting direct liquid cooling at densities of 220–380 kW per rack. • Evaluate and qualify CDU vendors on controls integration capabilities, telemetry exposure, and remote management interfaces. • Define and enforce operational procedures for CDU commissioning, setpoint changes, loop pressure management, and fluid quality monitoring. • Support design and construction coordination for liquid cooling infrastructure in new data center buildouts, ensuring BMS and controls readiness at Day 1. • Establish and maintain facility event management processes, including on-call response protocols for facility telemetry anomalies. • Lead root cause analysis for facility system failures and implement corrective actions to prevent recurrence. • Partner with the data center operations team to maintain and refine emergency response runbooks tied to BMS alerts and automated controls. • Drive continuous improvement in MTTR for facility-related events through better telemetry coverage and automated remediation. • Manage BMS integrators, DCIM vendors, and control subcontractors - from RFP through design, installation, commissioning, and ongoing support. • Serve as the primary technical interface with colocation providers on all BMS/EPMS integration topics. • Collaborate with Lambda's infrastructure engineering, construction, and procurement teams to align controls requirements with facility buildout timelines. • Support due diligence and technical evaluation for new colocation sites and modular data center deployments from a telemetry and controls readiness perspective.
• 7+ years of experience in data center infrastructure engineering, with at least 4 years focused on BMS, DCIM, or controls systems in a hyperscale, colocation, or AI/HPC environment. • Hands-on experience designing and integrating BMS for mission-critical facilities including UPS, PDU, CRAH/CRAC, chiller plant, cooling tower, and liquid cooling (CDU/in-row) systems. • Strong working knowledge of industrial control protocols: BACnet IP/MS-TP, Modbus TCP/RTU, SNMP, DNP3, and modern API-based integrations. • Demonstrated experience with DCIM platforms (Nlyte, Sunbird, Vertiv TRELLIS, or equivalent) including deployment, configuration, and ongoing administration. • Experience with real-time telemetry stacks (Prometheus, InfluxDB, Grafana, or similar) applied to infrastructure monitoring use cases. • Strong understanding of data center power and cooling systems, including PUE optimization, thermal management, and redundancy architectures (2N, N+1).
• Opportunity to shape the telemetry and controls architecture for one of the fastest-growing AI infrastructure platforms in the industry. • Work with cutting-edge GPU infrastructure at rack densities at the frontier of what the industry has deployed. • Collaborative environment with experienced infrastructure, construction, and vendor teams across a rapidly scaling global portfolio. • Competitive compensation including salary, equity, and comprehensive benefits. • Flexibility in work location with hybrid/remote options depending on facility portfolio needs.
Apply Now🔥 1 hour ago
OSP Project Engineer at Lumos responsible for planning and preparing construction drawings for fiber internet infrastructure. Designing for optimal use of communications facilities in a rapidly growing company.
🔥 2 hours ago
Revenue Intelligence Engineer optimizing internal applications in the Revenue Operations team at Greenhouse. Collaborating with tech and business leaders to enhance productivity across teams.
🇺🇸 United States – Remote
💵 $128.3k - $180k / year
⏰ Full Time
🟡 Mid-level
🟠 Senior
👷🏻♀️ Engineer
🦅 H1B Visa Sponsor
🔥 2 hours ago
1001 - 5000
Senior IAM Engineer overseeing the operational maintenance and expansion of ForgeRock IAM platform. Ensuring high availability and optimal performance while developing custom scripts and configurations.
🔥 2 hours ago
Technical Lead enhancing developer experience for Galaxy's blockchain and data center solutions. Leading DevEx initiatives and adopting AI tools for team improvement.
🔥 3 hours ago
51 - 200
Senior Value Engineer at dbt Labs shaping organization value through analytics engineering expertise. Collaborating with strategic customers to enhance product adoption and account growth.