Post a Job Affiliates

Search Remote Jobs

Lambda

Website LinkedIn All Job Openings

Artificial Intelligence • SaaS • Hardware

Lambda is a company that provides cloud-based solutions and hardware for AI development. They offer on-demand GPU clusters for multi-node training and fine-tuning, as well as inference endpoints and APIs. Their products include the Lambda GPU Cloud, which features NVIDIA's latest generation of infrastructure for enterprise AI, and customizable GPU workstations and desktops designed for AI and deep learning. Lambda also offers a one-line installation and managed upgrade path for machine learning tools like PyTorch, TensorFlow, and NVIDIA CUDA. By focusing on enabling AI developers, Lambda provides both public and private cloud services with access to powerful NVIDIA Tensor Core GPUs.

51 - 200 employees

🤖 Artificial Intelligence

☁️ SaaS

🔧 Hardware

💰 $39.7M Venture Round on 2022-11

Senior Site Reliability Engineer, Managed Kubernetes

Job not on LinkedIn

October 9

🇩🇪 Germany – Remote

💵 €161k - €310k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Grafana

Kubernetes

Linux

Prometheus

Python

Apply Now

Lambda

Website LinkedIn All Job Openings

Artificial Intelligence • SaaS • Hardware

51 - 200 employees

🤖 Artificial Intelligence

☁️ SaaS

🔧 Hardware

💰 $39.7M Venture Round on 2022-11

📋 Description

• Operate and maintain bare-metal Kubernetes clusters, scaling up to thousands of nodes • Handle cluster degradation, recovery, resizing, and incident response using fleet management tools • Participate in a well-managed on-call rotation for critical incidents • Assist customers with Kubernetes questions, workload integration, storage, and authentication • Work closely with our HPC Ops and Datacenter Ops teams for low-level or cross-functional issues • Use Python and Golang to create tooling and automate the validation of platform quality. • Design, build, and maintain scalable control plane services, operators, and custom controllers for Kubernetes • Develop automation for cluster lifecycle management: provisioning, upgrades, patching, and deletion. • Define and implement SLOs and SLIs for Kubernetes services, workloads, and platform reliability.

🎯 Requirements

• 6+ years of experience in a SRE, operations engineer, or similar role, with a deep knowledge of running Linux clusters and systems • Strong programming skills in Go and Python; experience with GitOps (e.g., ArgoCD), Helm, and Kubernetes operators • Proven experience operating Kubernetes clusters in production environments (on-prem, EKS, GKE, or similar) • Can work either independently with limited direction or as part of a team • Can work with customers during incidents either via tickets, live messaging, or as part of a larger call. • Familiarity with observability tools like Prometheus, Grafana, FluentBit, and CI/CD pipelines • Proven experience provisioning Kubernetes using tools such as kubeadm, Cluster API, or similar

🏖️ Benefits

• Health, dental, and vision coverage for you and your dependents • Wellness and Commuter stipends for select roles • 401k Plan with 2% company match (USA employees) • Flexible Paid Time Off Plan that we all actually use

Apply Now

Similar Jobs

Cloud Site Reliability Engineer

October 2

Scalable

201 - 500

Website LinkedIn All Job Openings

Cloud Engineer improving AWS Infrastructure at fintech startup. Mentoring teams in a DevOps culture and developing internal tools for cloud services.

🇩🇪 Germany – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

Python

Terraform

Apply

View Job

DevOps Engineer - Consultant

October 2

evoila

201 - 500

Website LinkedIn All Job Openings

Consultant building and advising on Kubernetes developer platforms for clients at evoila, an agile cloud engineering company.

🇩🇪 Germany – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗣️🇩🇪 German Required

Cloud

Kubernetes

Apply

View Job

Kubernetes DevOps Engineer – Global

October 1

Mirantis

501 - 1000

🏢 Enterprise

☁️ SaaS

Website LinkedIn All Job Openings

Kubernetes DevOps Engineer building and integrating AI infrastructure on Kubernetes for Mirantis k0rdent-ai platform.

🇩🇪 Germany – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Cloud

Grafana

Kubernetes

Linux

OpenStack

Apply

View Job

DevOps Engineer

September 30

CENTOGENE

501 - 1000

🧬 Biotechnology

💊 Pharmaceuticals

🔬 Science

Website LinkedIn All Job Openings

Build and maintain secure AWS infrastructure and CI/CD pipelines for CENTOGENE's genomic diagnostics. Implement IaC, containers, serverless workflows, and collaborate internationally.

🇩🇪 Germany – Remote

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

Docker

EC2

Kubernetes

Python

Terraform