Staff SRE, AI Infrastructure

Trouver des Emplois à Distance Similaires

11 - 50 employés

🤖 Intelligence artificielle

🤝 B2B

🔧 Matériel

🔥 Financement dans la dernière année

💰 €15 142 238 Series A - Andromeda Robotics en 2025-09

Artificial Intelligence • B2B • Hardware

Andromeda est un service de calcul GPU et une place de marché offrant un accès instantané à de grands clusters d'accélérateurs H100, H200 et B200 pour des expériences, des formations à grande échelle et des inférences. Il prend en charge l'orchestration avec Slurm, Kubernetes ou SSH direct, propose une utilisation flexible sans durée minimum et des tarifs compétitifs, inclut une expertise en DevOps, un stockage NAS local ou en streaming sans frais d'entrée/sortie, et un support 24/7 avec des SLA standards de l'industrie. L'entreprise exploite également une place de marché tierce de GPU sur gpulist. ai.

Staff SRE, AI Infrastructure

Emploi pas sur LinkedIn

🕒 il y a 28 jours

🏄 California – Distant

⏰ Temps Plein

🔴 Expert

⛑ Ingénieur DevOps & SRE

🦅 Parrain de Visa H1B

🗣️🇺🇸🇬🇧 Anglais requis

Linux

Python

PyTorch

Rust

Postuler Maintenant

📊 Vérifiez votre score de CV pour ce poste

Améliorez vos chances d'obtenir un entretien en vérifiant votre score de CV avant de postuler.

Andromeda

11 - 50 employés

🤖 Intelligence artificielle

🤝 B2B

🔧 Matériel

🔥 Financement dans la dernière année

💰 €15 142 238 Series A - Andromeda Robotics en 2025-09

Artificial Intelligence • B2B • Hardware

Description

• Own the reliability of Andromeda's infrastructure end to end • Lead top-customer training run responses and write the postmortem • Ensure the health of thousands of GPUs across providers • Build telemetry, GPU health checks, and automated remediation • Define on-call processes like rotations and escalation • Be the reliability voice in customer incident reviews • Collaborate closely with the product team on SLOs • Partner with providers and data center teams on physical design • Make other engineers better through mentorship

🎯 Exigences

• Multiple years building and operating large-scale GPU infrastructure as your primary job • A clear history of owning the reliability of load-bearing infrastructure • Deep, hands-on with NVIDIA H100/H200/B200/GB200 (or equivalent) at scale • Real production experience with InfiniBand, RoCE, and NVLink fabrics • Working knowledge of how large training jobs run — NCCL, CUDA, PyTorch distributed • Strong Go, Python, or Rust proficiency • Expert-level Linux & Systems Internals • Comfortable being the senior engineer on a P0 bridge with the customer • Comfortable being the senior technical voice with AI infra customers

🏖️ Avantages

• Significant autonomy • Working on infrastructure that the most ambitious AI labs depend on

Postuler Maintenant

Emplois Similaires

Payment Platform DevOps Engineer

🕒 il y a 1 mois

SouthState Bank

1001 - 5000

🏦 Banque

💸 Finance

💳 Fintech

Payment Platform DevOps Engineer at SouthState enabling secure and scalable delivery of cloud-based payment solutions. Collaborating with internal teams for innovation in payment technology.

🇺🇸 États-Unis – Télétravail

💵 $152 630 - $243 812 / an

⏰ Temps Plein

🟠 Senior

🔴 Expert

⛑ Ingénieur DevOps & SRE

🗣️🇺🇸🇬🇧 Anglais requis

ASP.NET

Azure

Cloud

Ruby on Rails

SDLC

SQL

Terraform

TypeScript

Vault

.NET

Director, AI-Enabled DevOps Transformation

🕒 il y a 1 mois

Valiantys - Atlassian Platinum Solution Partner

51 - 200

🏢 Entreprise

☁️ SaaS

🤝 B2B

Director for AI-Enabled DevOps Transformation at Valiantys, focusing on enterprise account growth and strategy alignment. Engage with clients on SDLC modernization and AI-enabled delivery.

🇺🇸 États-Unis – Télétravail

💵 $175 000 - $240 000 / an

⏰ Temps Plein

🔴 Expert

⛑ Ingénieur DevOps & SRE

🗣️🇺🇸🇬🇧 Anglais requis

AWS

Azure

Cloud

Google Cloud Platform

ITSM

Kubernetes

SDLC

Terraform

Principal DevOps Engineer

🕒 il y a 1 mois

Zscaler

5001 - 10000

🔒 Cybersecurity

☁️ SaaS

🏢 Entreprise

Principal DevOps Engineer managing AWS infrastructure for Zscaler’s Zero Trust Networking Services. Architecting cloud infrastructure and ensuring operational health in a remote role.

🇺🇸 États-Unis – Télétravail

💵 $182 000 - $260 000 / an

💰 Secondary Market en 2017-11

⏰ Temps Plein

🔴 Expert

⛑ Ingénieur DevOps & SRE

🦅 Parrain de Visa H1B

🗣️🇺🇸🇬🇧 Anglais requis

AWS

Cloud

Linux

NoSQL

Prometheus

Python

SQL

Terraform

DevOps Engineer, Observability

🕒 il y a 1 mois

Quantiphi

1001 - 5000

🤖 Intelligence artificielle

🏢 Entreprise

📚 Éducation

Senior DevOps/Observability Engineer leading unified observability platform design for Fortune 500 clients. Focused on architecting observability pipeline using AWS and modern open-source tools.

🇺🇸 États-Unis – Télétravail

💰 Series A en 2019-12

⏰ Temps Plein

🟠 Senior

🔴 Expert

⛑ Ingénieur DevOps & SRE

🦅 Parrain de Visa H1B

🗣️🇺🇸🇬🇧 Anglais requis

AWS

Grafana

Kubernetes

Prometheus

Splunk

Terraform

SRE Architect, AI-Powered Reliability

🕒 il y a 1 mois

WEX

5001 - 10000

🚗 Transport

💸 Finance

💳 Fintech

SRE Architect driving AI-Powered Reliability Engineering strategy and enforcing enterprise-wide SRE standards. Overseeing the architecture and implementation of mission-critical systems for WEX.

🇺🇸 États-Unis – Télétravail

💵 $200 600 - $250 400 / an

💰 €310 000 000 Post-IPO Debt en 2020-06

⏰ Temps Plein

🟠 Senior

🔴 Expert

⛑ Ingénieur DevOps & SRE

🦅 Parrain de Visa H1B

🗣️🇺🇸🇬🇧 Anglais requis

Cloud

Distributed Systems