Stelle veröffentlichen Partner

Remote-Jobs suchen

Mirantis

Website LinkedIn Alle Stellen

501 - 1000 Mitarbeiter

🏢 Unternehmen

☁️ SaaS

Cloud Computing • Enterprise • SaaS

Mirantis ist ein Unternehmen, das sich auf Container-Management und Cloud-Infrastrukturlösungen spezialisiert hat. Das Portfolio umfasst unter anderem Mirantis Kubernetes Engine (MKE), Mirantis OpenStack for Kubernetes (MOSK) und Mirantis Container Cloud (MCC) – Plattformen für Kubernetes und Container-Management auf Enterprise-Niveau. Darüber hinaus entwickelt Mirantis Werkzeuge für sichere Software-Lieferketten, etwa die Mirantis Container Runtime (MCR) und die Mirantis Secure Registry (MSR). Als Verfechter von Open-Source-Technologien unterstützt Mirantis verschiedene Projekte und stellt Ressourcen wie Lens Desktop, eine beliebte Kubernetes-IDE, sowie technischen Support für Unternehmen bereit, die Cloud-native Technologien einführen. Die Lösungen von Mirantis richten sich an Bereiche wie den öffentlichen Sektor, Finanzdienstleistungen sowie SaaS- und Technologiedienstleistungen.

Senior AI Infrastructure, Platform Operations Engineer

🔥 vor 3 Minuten

🇪🇺 Europa – Remote

⏰ Vollzeit

🟠 Senior

👷 IT-Infrastrukturingenieur

🗣️🇺🇸🇬🇧 Englisch erforderlich

Cloud

Distributed Systems

Grafana

Kubernetes

Linux

Prometheus

Jetzt Bewerben

Ähnliche Remote-Jobs finden

📊 Überprüfen Sie Ihre Lebenslauf-Bewertung für diese Stelle

Verbessern Sie Ihre Chancen auf ein Vorstellungsgespräch, indem Sie Ihre Lebenslauf-Bewertung vor der Bewerbung überprüfen.

Mirantis

Website LinkedIn Alle Stellen

501 - 1000 Mitarbeiter

🏢 Unternehmen

☁️ SaaS

Cloud Computing • Enterprise • SaaS

Beschreibung

• Lead the investigation and resolution of complex infrastructure, networking, and platform-related incidents. • Act as a senior escalation point for operational teams during critical service-impacting events. • Support large-scale NVIDIA GPU infrastructure and high-performance networking environments. • Troubleshoot complex Linux, Kubernetes, networking, storage, and hardware-related issues. • Analyze platform performance, capacity, stability, and reliability trends to proactively identify risks. • Lead root cause analysis activities and drive long-term corrective actions. • Collaborate with engineering teams, hardware vendors, and datacenter personnel to resolve complex technical challenges. • Participate in major incident management and service restoration activities. • Provide technical leadership for Kubernetes platform operations and supporting infrastructure services. • Drive improvements in platform reliability, observability, monitoring, and operational processes. • Identify opportunities to automate repetitive operational activities and improve operational efficiency. • Contribute to operational readiness reviews, infrastructure changes, upgrades, and service introductions. • Support the adoption and operation of AI-powered infrastructure services and operational capabilities through k0rdent AI. • Evaluate emerging technologies and operational practices to improve service delivery and platform resilience. • Mentor and support AI Infrastructure & Platform Operations Engineers. • Share technical knowledge through documentation, training sessions, and operational reviews. • Develop and maintain operational standards, runbooks, troubleshooting guides, and best practices. • Help define operational processes, escalation paths, and service reliability standards. • Act as a trusted technical advisor during operational planning and service improvement initiatives.

🎯 Anforderungen

• 7+ years of experience in infrastructure operations, platform operations, site reliability engineering, network operations, cloud operations, datacenter operations, or related technical roles. • Expert-level Linux administration and troubleshooting skills. • Strong networking expertise, including experience diagnosing complex performance, connectivity, and reliability issues. • Strong experience operating Kubernetes in production environments. • Experience supporting large-scale production infrastructure and distributed systems. • Proven experience leading technical investigations and managing complex incidents. • Experience performing root cause analysis and driving long-term operational improvements. • Strong understanding of observability, monitoring, and service reliability practices. • Excellent troubleshooting and analytical skills across multiple infrastructure domains. • Strong communication, collaboration, and stakeholder management skills. • Experience in one or more of the following areas is highly desirable: NVIDIA GPU infrastructure and accelerated computing platforms, InfiniBand networking and NVIDIA UFM, AI infrastructure environments, HPC environments, Platform Engineering or Site Reliability Engineering (SRE), Large-scale Kubernetes operations, Infrastructure automation technologies and Infrastructure-as-Code practices, Observability platforms such as Grafana, Prometheus, ELK, or OpenTelemetry, Performance analysis and optimisation of distributed infrastructure platforms, Technical leadership, mentoring, or team lead responsibilities.

🏖️ Vorteile

• Operate some of the most advanced AI infrastructure environments in production today. • Work with the latest NVIDIA GPU technologies, Kubernetes platforms, and high-performance networking environments. • Help define operational standards and reliability practices for next-generation AI infrastructure services. • Influence the adoption of AI-powered operational capabilities through k0rdent AI. • Work alongside highly skilled engineers solving complex infrastructure and platform challenges at scale. • Join a growing organisation investing heavily in AI infrastructure, platform services, and operational innovation.

Jetzt Bewerben

Ähnliche Jobs

Senior Unity VR Engineer, Client-Plattform-Infrastruktur

🕒 vor 5 Tagen

NIR-YU

201 - 500

🎯 Rekrutierung

👥 HR Tech

🏢 Unternehmen

Website LinkedIn Alle Stellen

Senior Unity-Ingenieur, der die clientseitige Infrastruktur für eine VR-Trainingsplattform entwickelt. Schwerpunkt auf Architektur und Optimierung in einer flexiblen, vollständig remote ausgelegten Arbeitsumgebung.

🇪🇺 Europa – Remote

⏰ Vollzeit

🟠 Senior

👷 IT-Infrastrukturingenieur

🗣️🇺🇸🇬🇧 Englisch erforderlich

Unity

Bewerben

Stelle Ansehen

Data-Warehouse- und Infrastruktur-Ingenieur

🕒 vor 28 Tagen

Thrill

11 - 50

🎮 Gaming

🥽 AR/VR

Website LinkedIn Alle Stellen

Data-Warehouse- und Infrastruktur-Ingenieur, der ClickHouse-Abfragen optimiert und die Dateninfrastruktur bei Thrill Labs verwaltet. Verantwortlich für die Pflege von Datenmodellen und Dashboards sowie die Sicherstellung von Datenqualität und Performance.

🇪🇺 Europa – Remote

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

👷 IT-Infrastrukturingenieur

🗣️🇺🇸🇬🇧 Englisch erforderlich

Ansible

Docker

Kafka

Kubernetes

Linux

Shell Scripting

SQL

Terraform

Zookeeper

Bewerben

Stelle Ansehen

Initiativbewerbung – Infrastructure Engineer

🕒 vor 3 Monaten

Amplemarket

51 - 200

🤖 Künstliche Intelligenz

🤝 B2B

☁️ SaaS

Website LinkedIn Alle Stellen

Infrastructure Engineer bei Amplemarket, das KI für B2B-Vertriebslösungen einsetzt. Aufbau skalierbarer Systeme für Zuverlässigkeit und Förderung bereichsübergreifender Zusammenarbeit.

🇪🇺 Europa – Remote

💰 €12.000.000 Series A im 2022-04

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

👷 IT-Infrastrukturingenieur

🗣️🇺🇸🇬🇧 Englisch erforderlich

Cloud

Bewerben

Stelle Ansehen

Mehr IT-Infrastrukturingenieur Jobs anzeigen

Entwickelt von Lior Neu-ner. Ich freue mich über Ihr Feedback — kontaktieren Sie mich per DM oder per E-Mail [email protected]