DevOps Engineer – AI Inference

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Gcore

Gcore

201 - 500 employees

🔐 Security

🤖 Artificial Intelligence

Cloud Computing • Security • Artificial Intelligence

Gcore is a global provider of cloud, edge, and AI solutions that accelerate AI training, deliver comprehensive cloud services, enhance content delivery, and protect servers and applications. With over 180 points of presence worldwide and a network capacity of 200+ Tbps, Gcore offers secure, flexible, and scalable infrastructure services. Its integrated offerings, including Edge Cloud, Edge Network, Edge Security, and AI Infrastructure, are designed to meet the needs of businesses looking to scale and control their global infrastructure efficiently. Gcore also provides robust DDoS protection and origin shielding to ensure uninterrupted online operations, making it a trusted partner for thousands of businesses worldwide.

📋 Description

• Design, develop, and maintain infrastructure for AI inference workloads, including GPU scheduling, model deployment pipelines, and data access patterns in on-prem environments • Build and manage monitoring and observability tools for AI inference platforms, including dashboards, alerts, and runbooks for model health and system performance • Collaborate with ML engineers and platform teams to design system architecture for AI workloads, integrate inference runtimes, and test performance at scale

🎯 Requirements

• Strong understanding of Kubernetes architecture, including CNI, CSI, operators, ingress/gateway, and control plane components. • Hands-on experience operating and troubleshooting production Kubernetes clusters. • Strong Linux and networking troubleshooting skills, including DNS, routing, firewalling, TLS, MTU, connectivity and performance issues. • Ability to develop automation and operational tooling using Python, Go, or Bash. • Experience with Terraform, Ansible, or similar IaC/configuration management tools. • Experience with VictoriaMetrics/Grafana or similar monitoring, alerting, and troubleshooting tools. • Strong experience with Git-based workflows and CI/CD pipelines. • Familiarity with Cluster API or similar Kubernetes cluster lifecycle management technologies. • Hands-on operation or administration of Slurm clusters. • Knowledge of Argo CD, GitOps workflows, Helm, or Helmfile. • Background working with managed platforms, PaaS, or cloud services. • Exposure to bare metal, GPU, HPC, or other high-performance computing environments. • Familiarity with the NVIDIA GPU stack, RDMA/InfiniBand, or high-performance networking. • Knowledge of OpenStack or similar cloud infrastructure platforms. • Hands-on experience developing Kubernetes operators or controllers.

🏖️ Benefits

• Competitive compensation • Flexible working hours and hybrid or remote options, depending on your role • Work from anywhere in the world for up to 45 days per year • Private medical insurance for you and your family* • Extra paid vacation and sick leave days* • Support for life’s important moments and celebrations • Language courses to help you connect and grow • Modern, welcoming offices with snacks, drinks, and entertainment* • Team sports and social activities*

Apply Now

Similar Jobs

🕒 June 9

Slash

11 - 50

🤖 Artificial Intelligence

☁️ SaaS

🛍️ eCommerce

Senior DevOps Engineer leading CI/CD transformation initiatives in a tech AI-powered startup. Establishing scalable CI/CD foundations and improving development workflow across teams.

AWS

Cloud

Jenkins

🕒 May 22

BJAK

51 - 200

🛍️ eCommerce

🏪 Marketplace

DevOps Engineer responsible for building and operating the infrastructure for BJAK’s engineering teams. Ensuring reliable, secure, and scalable deployments across cloud environments.

AWS

Azure

Cloud

DNS

Docker

Google Cloud Platform

Grafana

Jenkins

Kubernetes

Linux

Prometheus

Python

Shell Scripting

TCP/IP

Terraform

🕒 May 21

Slash

11 - 50

🤖 Artificial Intelligence

☁️ SaaS

🛍️ eCommerce

DevOps Engineer role at Slash, a tech startup specializing in AI-powered solutions. Collaborate with teams on CI/CD processes and Monorepo management in a remote capacity.

🕒 April 16

A:20Labs

1 - 10

Deployment Engineer for an AI-focused deeptech client, managing Kubernetes deployment strategies and client interactions. Seeking a technical advisor to enhance AI-driven engineering solutions.

AWS

Azure

Kubernetes

Terraform

🕒 April 15

Arize AI

51 - 200

🤖 Artificial Intelligence

☁️ SaaS

🏢 Enterprise

Work as a Senior DevOps Engineer at Arize AI, handling infrastructure for customer environments. Engage with clients while focusing on Kubernetes and multi-cloud solutions.

AWS

Azure

Cloud

Google Cloud Platform

Kubernetes