AI Infrastructure & Platform Operations Engineer

đŸ”„ 6 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Mirantis

Mirantis

501 - 1000 employees

🏱 Enterprise

☁ SaaS

Cloud Computing ‱ Enterprise ‱ SaaS

Mirantis is a company that specializes in container management and cloud infrastructure solutions. It offers a range of products, including Mirantis Kubernetes Engine (MKE), Mirantis OpenStack for Kubernetes (MOSK), and Mirantis Container Cloud (MCC), which provide enterprise-level Kubernetes and container management platforms. Mirantis also develops tools for secure software supply chains, such as the Mirantis Container Runtime (MCR) and Mirantis Secure Registry (MSR). As an advocate for open source technologies, Mirantis supports various projects and provides resources like Lens Desktop, a popular Kubernetes IDE, and technical support for enterprises adopting cloud-native technologies. Their solutions cater to sectors such as public services, financial services, and broader SaaS and technology services industries.

📋 Description

‱ Monitor, operate, and support production AI infrastructure platforms. ‱ Investigate and resolve infrastructure, networking, hardware, and platform-related incidents. ‱ Support NVIDIA GPU infrastructure and associated platform services. ‱ Monitor and troubleshoot Kubernetes-based environments. ‱ Investigate performance, availability, and reliability issues across infrastructure and platform components. ‱ Collaborate with engineering teams, hardware vendors, datacenter personnel, and service delivery teams to resolve technical issues. ‱ Participate in incident response, root cause analysis, and operational improvement activities. ‱ Contribute to improvements in monitoring, observability, automation, and operational processes. ‱ Maintain operational documentation, runbooks, and knowledge articles.

🎯 Requirements

‱ 3+ years of experience in infrastructure operations, platform operations, network operations, site reliability engineering, cloud operations, datacenter operations, or related technical roles. ‱ Strong Linux administration and troubleshooting skills. ‱ Good understanding of networking concepts and experience diagnosing infrastructure-related issues. ‱ Working knowledge of Kubernetes in production environments. ‱ Experience supporting production infrastructure and services. ‱ Strong analytical and problem-solving skills. ‱ Experience working within structured operational and incident management processes. ‱ Excellent communication and collaboration skills. ‱ Ability to work within a shift-based operational environment.

đŸ–ïž Benefits

‱ Work with some of the most advanced AI infrastructure environments in production today. ‱ Gain exposure to NVIDIA GPU technologies, Kubernetes platforms, and high-performance networking environments. ‱ Help define how next-generation AI infrastructure is operated and supported. ‱ Be part of a team shaping the future of AI-powered operations through k0rdent AI. ‱ Join a growing organisation investing heavily in AI infrastructure and platform services.

Apply Now

Similar Jobs

đŸ”„ 1 hour ago

Software Mind

1001 - 5000

đŸ€– Artificial Intelligence

☁ SaaS

📡 Telecommunications

Platform Engineer joining Software Mind to work on high-performance infrastructure, automation, and self-service provisioning. Collaborating with API developers and infrastructure specialists.

Cloud

Kubernetes

Linux

Microservices

Python

🕒 May 30

Equinix

5001 - 10000

📡 Telecommunications

🏱 Enterprise

☁ SaaS

Senior Staff Platform Engineer at Equinix designing solutions for monitoring and obtaining telemetry. Join a skilled team to drive automation and infrastructure management.

Ansible

Distributed Systems

Docker

Grafana

Jenkins

Kubernetes

Linux

Prometheus

Puppet

Vault

VMware

🕒 May 29

VirtusLab

201 - 500

💳 Fintech

Data Platform Engineer designing and implementing solutions for indexing Atlan metadata. Collaborating with DevOps to ensure production readiness and compliance standards are met.

Kubernetes

Python

🕒 May 29

Hitachi

10,000+ employees

đŸ€– Artificial Intelligence

⚡ Energy

🚗 Transport

AI Platform Engineer responsible for designing and evolving the Global AI Platform. Collaborating with teams to ensure AI capabilities and performance meet business needs.

Azure

Cloud

Terraform

🕒 May 26

The Codest

51 - 200

💳 Fintech

đŸ›ïž eCommerce

Senior Platform Engineer working with cloud and DevOps for an international tech software company. Focused on system operations, scaling, and automation in a collaborative environment.

AWS

Azure

Cloud

DNS

Docker

Google Cloud Platform

IPFS

Linux

Node.js

Python

Ruby

Terraform

Go