Senior Compute Platform Engineer

🔥 15 hours ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Stack AV

Stack AV

51 - 200 employees

🚗 Transport

🤖 Artificial Intelligence

Transport • Artificial Intelligence

Stack AV is a company that is revolutionizing the transportation industry through its autonomous trucking solutions, driven by advanced artificial intelligence. The company focuses on developing AI-powered autonomous systems to enhance safety, reliability, and efficiency in trucking operations. Stack AV is committed to addressing the challenges of the trucking industry by designing smart solutions to improve supply chain intelligence, business outcomes, and delivery speed. Safety is a core principle, and the company leverages cutting-edge AI, machine learning, and cloud technologies to innovate within the industry.

📋 Description

• Design and operate distributed systems for scheduling and executing large-scale batch workloads across Kubernetes clusters. • Build and maintain compute platform abstractions. • Optimize utilization of compute resources. • Develop and improve multi-tenant scheduling strategies. • Improve reliability and fault tolerance of large-scale distributed jobs and platform components. • Collaborate with teams across the company to understand workload requirements and improve platform capabilities. • Contribute to platform tooling, automation, and CI/CD workflows.

🎯 Requirements

• 7+ years of experience building and operating distributed systems or infrastructure platforms. • Strong experience with Kubernetes and container orchestration in production grade environments. • Proficiency developing in Golang and Python. • Experience designing and operating large-scale batch compute systems. • Strong debugging and problem-solving skills in complex distributed systems. • Ability to collaborate across teams and communicate technical concepts clearly. • Experience with at least one batch scheduling system such as Kueue, Armada, Volcano, or Slurm.

Apply Now

Similar Jobs

🔥 15 hours ago

Accenture Federal Services

10,000+ employees

🤖 Artificial Intelligence

🔒 Cybersecurity

🏛️ Government

Power Platform Developer delivering solutions on Microsoft Power Platform. Collaborating with clients to develop low code solutions using Power Apps and Power BI.

🔥 17 hours ago

ARETUM

501 - 1000

🏛️ Government

🔒 Cybersecurity

🏢 Enterprise

Power Platform Developer developing technical solutions for federal defense and security clients. Collaborating with teams to ensure compliance and performance in a mission-driven organization.

Azure

JavaScript

React

SDLC

TypeScript

.NET

🕒 3 days ago

Flexential

501 - 1000

🤝 B2B

📡 Telecommunications

🏢 Enterprise

Senior Platform Engineer developing and managing critical IT platforms at Flexential. Focused on automation, observability, and high availability using advanced technologies.

Ansible

Azure

Bootstrap

Cloud

Docker

Flux

Google Cloud Platform

Grafana

Kubernetes

Linux

Prometheus

Python

ServiceNow

TCP/IP

Terraform

Vault

VMware

🕒 3 days ago

Flexential

501 - 1000

🤝 B2B

📡 Telecommunications

🏢 Enterprise

Sr Manager, Platform Engineering leading a team in developing observability solutions at Flexential. Overseeing technical planning, implementation, and operational management with a focus on DevOps and security.

Ansible

AWS

Cloud

Grafana

ITSM

Kubernetes

Linux

Prometheus

Python

SDLC

TCP/IP

Terraform

Vault

VMware

🕒 3 days ago

Derex Technologies Inc

51 - 200

🏢 Enterprise

☁️ SaaS

Dynamics Consultant and CRM/Power Platform Developer Lead at Derex Technologies Inc specializing in IT consulting and staffing solutions. Working on projects to deliver Dynamics CRM/365 solutions and integration.

Azure

JavaScript