Site Reliability Engineer – AI Infrastructure

Trouver des Emplois à Distance Similaires

11 - 50 employés

🤖 Intelligence artificielle

🤝 B2B

🔧 Matériel

🔥 Financement dans la dernière année

💰 €15 142 238 Series A - Andromeda Robotics en 2025-09

Artificial Intelligence • B2B • Hardware

Andromeda est un service de calcul GPU et une place de marché offrant un accès instantané à de grands clusters d'accélérateurs H100, H200 et B200 pour des expériences, des formations à grande échelle et des inférences. Il prend en charge l'orchestration avec Slurm, Kubernetes ou SSH direct, propose une utilisation flexible sans durée minimum et des tarifs compétitifs, inclut une expertise en DevOps, un stockage NAS local ou en streaming sans frais d'entrée/sortie, et un support 24/7 avec des SLA standards de l'industrie. L'entreprise exploite également une place de marché tierce de GPU sur gpulist. ai.

Site Reliability Engineer – AI Infrastructure

Emploi pas sur LinkedIn

🕒 il y a 3 mois

🏄 California – Distant

⏰ Temps Plein

🟡 Intermédiaire

🟠 Senior

⛑ Ingénieur DevOps & SRE

🦅 Parrain de Visa H1B

🗣️🇺🇸🇬🇧 Anglais requis

Ansible

Grafana

Kubernetes

Linux

Prometheus

Python

Terraform

Postuler Maintenant

📊 Vérifiez votre score de CV pour ce poste

Améliorez vos chances d'obtenir un entretien en vérifiant votre score de CV avant de postuler.

Andromeda

11 - 50 employés

🤖 Intelligence artificielle

🤝 B2B

🔧 Matériel

🔥 Financement dans la dernière année

💰 €15 142 238 Series A - Andromeda Robotics en 2025-09

Artificial Intelligence • B2B • Hardware

Description

• Provision, configure, and operate Kubernetes-based clusters for customers across multiple providers • Build automation and tooling to streamline cluster deployments and integrations • Debug customer issues across networking, storage, scheduling, and system layers • Improve reliability and scalability of both training and inference infrastructure • Design and implement monitoring, alerting, and observability for critical systems • Collaborate with engineering and product teams to plan and deliver infrastructure for new services • Participate in on-call and incident response, leading postmortems and reliability improvements

🎯 Exigences

• 5+ years experience in SRE, DevOps, or infrastructure engineering roles • Strong Linux systems and networking fundamentals • Deep experience with Kubernetes and container orchestration at scale • Proficiency with Infrastructure-as-Code (Terraform, Helm, Ansible, etc.) • Strong automation and scripting skills (Python, Go, or Bash) • Experience with observability stacks (Prometheus, Grafana, Loki, Datadog, etc.) • Track record of operating production systems and leading incident response

🏖️ Avantages

• Ownership and autonomy to shape systems • Opportunities to work directly with customers and providers

Postuler Maintenant

Emplois Similaires

Software Architect, Reliability Engineering

🕒 il y a 3 mois

Twilio

5001 - 10000

Reliability Architect at Twilio defining and leading solutions for reliable products. Collaborating with teams to ensure operational excellence and scalability in high-scale systems design.

🇺🇸 États-Unis – Télétravail

💵 $227 840 - $335 000 / an

⏰ Temps Plein

🟠 Senior

🔴 Expert

⛑ Ingénieur DevOps & SRE

🦅 Parrain de Visa H1B

🗣️🇺🇸🇬🇧 Anglais requis

AWS

Cloud

Distributed Systems

Grafana

Java

Kubernetes

Microservices

Prometheus

Python

Terraform

DevOps Security Engineer

🕒 il y a 3 mois

Knox Systems, Inc.

201 - 500

🏛️ Gouvernement

🔒 Cybersecurity

📋 Conformité

Devops Security Engineer at Knox securing cloud-native environments for U.S. government missions. Focus on preventative security, automation, and continuous compliance within FedRAMP frameworks.

🇺🇸 États-Unis – Télétravail

💵 $110 000 - $140 000 / an

🔥 Financement dans la dernière année

💰 €6 500 000 Seed en 2025-08

⏰ Temps Plein

🟡 Intermédiaire

🟠 Senior

⛑ Ingénieur DevOps & SRE

🗣️🇺🇸🇬🇧 Anglais requis

AWS

Azure

Cloud

Google Cloud Platform

Kubernetes

Terraform

Senior DevOps Engineer

🕒 il y a 3 mois

JFrog

1001 - 5000

🏢 Entreprise

☁️ SaaS

🔐 Sécurité

Senior Professional Services DevOps Engineer designing CI/CD pipelines at JFrog. Collaborating with clients and teams to enhance DevOps experience.

🇺🇸 États-Unis – Télétravail

💵 $160 000 - $175 000 / an

⏰ Temps Plein

🟠 Senior

⛑ Ingénieur DevOps & SRE

🦅 Parrain de Visa H1B

🗣️🇺🇸🇬🇧 Anglais requis

Ansible

AWS

Azure

Chef

Cloud

Docker

Google Cloud Platform

Java

Jenkins

Kubernetes

Linux

Maven

Open Source

Puppet

Backend/DevOps Engineer

🕒 il y a 3 mois

Nick AI

1 - 10

🤖 Intelligence artificielle

₿ Crypto

☁️ SaaS

Backend/DevOps Engineer managing deployments and infrastructure for AI trading platform. Responsible for security, reliability, and scaling of systems across multiple venues.

🇺🇸 États-Unis – Télétravail

⏰ Temps Plein

🟡 Intermédiaire

🟠 Senior

⛑ Ingénieur DevOps & SRE

🗣️🇺🇸🇬🇧 Anglais requis

AWS

Cloud

Docker

Google Cloud Platform

Grafana

Kubernetes

Prometheus

Python

Web3