
Artificial Intelligence ⢠SaaS ⢠Hardware
Lambda is a company that provides cloud-based solutions and hardware for AI development. They offer on-demand GPU clusters for multi-node training and fine-tuning, as well as inference endpoints and APIs. Their products include the Lambda GPU Cloud, which features NVIDIA's latest generation of infrastructure for enterprise AI, and customizable GPU workstations and desktops designed for AI and deep learning. Lambda also offers a one-line installation and managed upgrade path for machine learning tools like PyTorch, TensorFlow, and NVIDIA CUDA. By focusing on enabling AI developers, Lambda provides both public and private cloud services with access to powerful NVIDIA Tensor Core GPUs.
51 - 200 employees
đ¤ Artificial Intelligence
âď¸ SaaS
đ§ Hardware
đ° $39.7M Venture Round on 2022-11
October 9
đŠđŞ Germany â Remote
đľ âŹ161k - âŹ310k / year
â° Full Time
đ Senior
â DevOps & Site Reliability Engineer (SRE)

Artificial Intelligence ⢠SaaS ⢠Hardware
Lambda is a company that provides cloud-based solutions and hardware for AI development. They offer on-demand GPU clusters for multi-node training and fine-tuning, as well as inference endpoints and APIs. Their products include the Lambda GPU Cloud, which features NVIDIA's latest generation of infrastructure for enterprise AI, and customizable GPU workstations and desktops designed for AI and deep learning. Lambda also offers a one-line installation and managed upgrade path for machine learning tools like PyTorch, TensorFlow, and NVIDIA CUDA. By focusing on enabling AI developers, Lambda provides both public and private cloud services with access to powerful NVIDIA Tensor Core GPUs.
51 - 200 employees
đ¤ Artificial Intelligence
âď¸ SaaS
đ§ Hardware
đ° $39.7M Venture Round on 2022-11
⢠Operate and maintain bare-metal Kubernetes clusters, scaling up to thousands of nodes ⢠Handle cluster degradation, recovery, resizing, and incident response using fleet management tools ⢠Participate in a well-managed on-call rotation for critical incidents ⢠Assist customers with Kubernetes questions, workload integration, storage, and authentication ⢠Work closely with our HPC Ops and Datacenter Ops teams for low-level or cross-functional issues ⢠Use Python and Golang to create tooling and automate the validation of platform quality. ⢠Design, build, and maintain scalable control plane services, operators, and custom controllers for Kubernetes ⢠Develop automation for cluster lifecycle management: provisioning, upgrades, patching, and deletion. ⢠Define and implement SLOs and SLIs for Kubernetes services, workloads, and platform reliability.
⢠6+ years of experience in a SRE, operations engineer, or similar role, with a deep knowledge of running Linux clusters and systems ⢠Strong programming skills in Go and Python; experience with GitOps (e.g., ArgoCD), Helm, and Kubernetes operators ⢠Proven experience operating Kubernetes clusters in production environments (on-prem, EKS, GKE, or similar) ⢠Can work either independently with limited direction or as part of a team ⢠Can work with customers during incidents either via tickets, live messaging, or as part of a larger call. ⢠Familiarity with observability tools like Prometheus, Grafana, FluentBit, and CI/CD pipelines ⢠Proven experience provisioning Kubernetes using tools such as kubeadm, Cluster API, or similar
⢠Health, dental, and vision coverage for you and your dependents ⢠Wellness and Commuter stipends for select roles ⢠401k Plan with 2% company match (USA employees) ⢠Flexible Paid Time Off Plan that we all actually use
Apply NowOctober 2
201 - 500
Cloud Engineer improving AWS Infrastructure at fintech startup. Mentoring teams in a DevOps culture and developing internal tools for cloud services.
đŠđŞ Germany â Remote
â° Full Time
đĄ Mid-level
đ Senior
â DevOps & Site Reliability Engineer (SRE)
October 2
201 - 500
Consultant building and advising on Kubernetes developer platforms for clients at evoila, an agile cloud engineering company.
đŠđŞ Germany â Remote
â° Full Time
đĄ Mid-level
đ Senior
â DevOps & Site Reliability Engineer (SRE)
đŁď¸đŠđŞ German Required
October 1
Kubernetes DevOps Engineer building and integrating AI infrastructure on Kubernetes for Mirantis k0rdent-ai platform.
đŠđŞ Germany â Remote
â° Full Time
đĄ Mid-level
đ Senior
â DevOps & Site Reliability Engineer (SRE)
September 30
Build and maintain secure AWS infrastructure and CI/CD pipelines for CENTOGENE's genomic diagnostics. Implement IaC, containers, serverless workflows, and collaborate internationally.
đŠđŞ Germany â Remote
â° Full Time
đ Senior
đ´ Lead
â DevOps & Site Reliability Engineer (SRE)
August 28
201 - 500
Senior Security Engineer for DevOps and Cloud Platforms at auxmoney. Embeds security in CI/CD, automates controls, ensures compliant cloud security.
đŁď¸đŠđŞ German Required