AI Infrastructure, Platform Operations Engineer

501 - 1000 employees

🏢 Enterprise

☁️ SaaS

Cloud Computing • Enterprise • SaaS

Mirantis is a company that specializes in container management and cloud infrastructure solutions. It offers a range of products, including Mirantis Kubernetes Engine (MKE), Mirantis OpenStack for Kubernetes (MOSK), and Mirantis Container Cloud (MCC), which provide enterprise-level Kubernetes and container management platforms. Mirantis also develops tools for secure software supply chains, such as the Mirantis Container Runtime (MCR) and Mirantis Secure Registry (MSR). As an advocate for open source technologies, Mirantis supports various projects and provides resources like Lens Desktop, a popular Kubernetes IDE, and technical support for enterprises adopting cloud-native technologies. Their solutions cater to sectors such as public services, financial services, and broader SaaS and technology services industries.

AI Infrastructure, Platform Operations Engineer

🔥 5 minutes ago

🇪🇺 Europe – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🏗️ Platform Engineer

Cloud

Distributed Systems

Grafana

Kubernetes

Linux

Prometheus

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Mirantis

501 - 1000 employees

🏢 Enterprise

☁️ SaaS

Cloud Computing • Enterprise • SaaS

📋 Description

• Monitor, operate, and support production AI infrastructure platforms. • Investigate and resolve infrastructure, networking, hardware, and platform-related incidents. • Support NVIDIA GPU infrastructure and associated platform services. • Monitor and troubleshoot Kubernetes-based environments. • Investigate performance, availability, and reliability issues across infrastructure and platform components. • Collaborate with engineering teams, hardware vendors, datacenter personnel, and service delivery teams to resolve technical issues. • Participate in incident response, root cause analysis, and operational improvement activities. • Contribute to improvements in monitoring, observability, automation, and operational processes. • Maintain operational documentation, runbooks, and knowledge articles.

🎯 Requirements

• 3+ years of experience in infrastructure operations, platform operations, network operations, site reliability engineering, cloud operations, datacenter operations, or related technical roles. • Strong Linux administration and troubleshooting skills. • Good understanding of networking concepts and experience diagnosing infrastructure-related issues. • Working knowledge of Kubernetes in production environments. • Experience supporting production infrastructure and services. • Strong analytical and problem-solving skills. • Experience working within structured operational and incident management processes. • Excellent communication and collaboration skills. • Ability to work within a shift-based operational environment. • Experience in one or more of the following areas is highly desirable: NVIDIA GPU infrastructure and accelerated computing platforms. • InfiniBand networking and NVIDIA UFM. • Kubernetes platform operations. • AI infrastructure or HPC environments. • Site Reliability Engineering (SRE) or Platform Engineering. • Observability platforms such as Grafana, Prometheus, ELK, or OpenTelemetry. • Infrastructure automation technologies and Infrastructure-as-Code practices. • Large-scale distributed systems and production platforms.

🏖️ Benefits

• Work with some of the most advanced AI infrastructure environments in production today. • Gain exposure to NVIDIA GPU technologies, Kubernetes platforms, and high-performance networking environments. • Help define how next-generation AI infrastructure is operated and supported. • Be part of a team shaping the future of AI-powered operations through k0rdent AI. • Join a growing organisation investing heavily in AI infrastructure and platform services.

Apply Now

Similar Jobs

Senior Platform Engineer

🕒 June 12

Vira Games

51 - 200

🎮 Gaming

👥 B2C

Senior Platform Engineer designing and developing backend services for a gaming company. Focusing on GaaS platform architecture, quality assurance, and infrastructure solutions.

🇪🇺 Europe – Remote

⏰ Full Time

🟠 Senior

🏗️ Platform Engineer

🗣️🇺🇦 Ukrainian Required

AWS

NoSQL

Python

Platform Engineer

🕒 May 8

bloomon

51 - 200

🛒 Retail

🛍️ eCommerce

Platform Engineer working across technology domains at Bloom & Wild. Enhancing e-commerce, data, and infrastructure solutions with a focus on autonomy and innovation.

🇪🇺 Europe – Remote

💰 Series C on 2019-03

⏰ Full Time

🟡 Mid-level

🟠 Senior

🏗️ Platform Engineer

AWS

Python

Ruby

Terraform

Senior Platform Engineer

🕒 May 5

saas.group

51 - 200

☁️ SaaS

🏢 Enterprise

🤝 B2B

Senior Platform Engineer for ScraperAPI, managing and consolidating infrastructure for high-performance web scraping solutions. Collaborate with engineering teams to drive significant platform improvements.

🇪🇺 Europe – Remote

⏰ Full Time

🟠 Senior

🏗️ Platform Engineer

Kubernetes

Prometheus

Terraform

Senior Platform Engineer – Cloud, AI Adoption

🕒 March 20

TD SYNNEX

10,000+ employees

🏢 Enterprise

☁️ SaaS

📡 Telecommunications

Senior Platform Engineer architecting multi-cloud infrastructure for AI-driven applications at TD SYNNEX. Focusing on automation and collaboration between Developers, Business, and Operations.

🇪🇺 Europe – Remote

⏰ Full Time

🟠 Senior

🏗️ Platform Engineer

Ansible

AWS

Azure

Cloud

Google Cloud Platform

Linux

Python

Terraform

Senior Platform Engineer

🕒 March 12

Polar

1 - 10

💳 Fintech

☁️ SaaS

🔌 API

Senior Platform Engineer architecting and evolving the Polar platform for high-velocity startups. Designing systems emphasizing reliability and scalability in financial workflows across various engineering layers.

🇪🇺 Europe – Remote

⏰ Full Time

🟠 Senior

🏗️ Platform Engineer

Cloud

Distributed Systems

Open Source

SQL