
51 - 200 employees
🤖 Artificial Intelligence
☁️ SaaS
🔧 Hardware
💰 $39.7M Venture Round on 2022-11
Artificial Intelligence • SaaS • Hardware
Lambda is a company that provides cloud-based solutions and hardware for AI development. They offer on-demand GPU clusters for multi-node training and fine-tuning, as well as inference endpoints and APIs. Their products include the Lambda GPU Cloud, which features NVIDIA's latest generation of infrastructure for enterprise AI, and customizable GPU workstations and desktops designed for AI and deep learning. Lambda also offers a one-line installation and managed upgrade path for machine learning tools like PyTorch, TensorFlow, and NVIDIA CUDA. By focusing on enabling AI developers, Lambda provides both public and private cloud services with access to powerful NVIDIA Tensor Core GPUs.
🔥 0 minutes ago
Improve your chances of getting an interview by checking your resume score before you apply.

51 - 200 employees
🤖 Artificial Intelligence
☁️ SaaS
🔧 Hardware
💰 $39.7M Venture Round on 2022-11
Artificial Intelligence • SaaS • Hardware
Lambda is a company that provides cloud-based solutions and hardware for AI development. They offer on-demand GPU clusters for multi-node training and fine-tuning, as well as inference endpoints and APIs. Their products include the Lambda GPU Cloud, which features NVIDIA's latest generation of infrastructure for enterprise AI, and customizable GPU workstations and desktops designed for AI and deep learning. Lambda also offers a one-line installation and managed upgrade path for machine learning tools like PyTorch, TensorFlow, and NVIDIA CUDA. By focusing on enabling AI developers, Lambda provides both public and private cloud services with access to powerful NVIDIA Tensor Core GPUs.
• Lead the response to critical (SEV-1 / SEV-2) incidents impacting AI infrastructure, GPU clusters, networking, storage, and data center operations. • Serve as the Incident Commander during major outages, coordinating engineering, networking, facilities, and vendor teams. • Act as the liaison between leadership and external teams during incidents/post-incidents to provide updates and status summaries. • Own the incident response lifecycle including: • - Assisting Technical Triage • - Escalation • - Coordination • - Resolution • Ensure timely and accurate communication with internal stakeholders and leadership. • Maintain incident response documentation and operational playbooks. • Conduct analysis on incidents and identify patterns/trends for improvement in response and systems reliability. • Work in an On-Call Rotation to respond to, lead, and coordinate incidents • Drive alignment during outages involving multiple infrastructure layers. • Lead post-incident reviews (PIRs) and root cause analysis. Identify systemic reliability gaps and implement corrective actions.
• 8+ years experience in incident management, site reliability engineering, or infrastructure operations • Experience managing incidents in large-scale distributed infrastructure environments • Strong understanding of: • - Data center operations • - GPU compute clusters • - Networking and storage infrastructure • - Cloud or hybrid infrastructure platforms • Proven ability to lead high-pressure incident response situations • Experience with incident management frameworks (ITIL, SRE, or equivalent) • Excellent communication and stakeholder management skills • Experience with incident tracking and monitoring tools such as: • - PagerDuty • - ServiceNow • - Jira • - Datadog • - Prometheus / Grafana
• Health, dental, and vision coverage for you and your dependents • Wellness and commuter stipends for select roles • 401k Plan with 2% company match (USA employees) • Flexible paid time off plan that we all actually use
Apply Now🔥 6 minutes ago
Site Quality Manager overseeing quality control for construction projects across multiple US locations. Responsibilities include managing quality programs and conducting inspections/audits for compliance.
🔥 7 minutes ago
Platform Product Manager leading product vision and strategy for equity products at Airbnb. Collaborating with cross-functional teams to enhance safety and integrity for Airbnb users.
🇺🇸 United States – Remote
💵 $179k - $207k / year
💰 Post-IPO Equity on 2020-12
⏰ Full Time
🟠 Senior
👔 Manager
🦅 H1B Visa Sponsor
🔥 41 minutes ago
Escrow Manager leading escrow operations including staff management at LoanCare. Responsible for vendor oversight and regulatory compliance in mortgage servicing.
🔥 1 hour ago
Manager, Corporate Travel responsible for leading a travel agent team and optimizing performance at ALTOUR. Focused on customer service, operational efficiencies, and client satisfaction.
🔥 1 hour ago
Manager of Revenue Cycle Management overseeing billing and collections at Theoria Medical. Responsible for optimizing revenue cycle operations and communication with stakeholders.