Search Remote Jobs

Platform Engineer, AI/ML Infrastructure

Job not on LinkedIn

August 18

🇺🇸 United States – Remote

💵 $160k - $220k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

🏗️ Platform Engineer

Apply Now
Logo of SECURENTITY

SECURENTITY

Cybersecurity • Enterprise • SaaS

SECURENTITY is a company specializing in Identity and Access Management (IAM) solutions. They provide a comprehensive suite of services designed to secure access to digital environments, empowering organizations to manage identities, enforce access control, and modernize their infrastructure effectively. With a commitment to simplifying IAM challenges, SECURENTITY offers managed services, cloud IAM solutions, and expert guidance to enhance security and streamline user management.

11 - 50 employees

🔒 Cybersecurity

🏢 Enterprise

☁️ SaaS

📋 Description

• Architect and maintain our core computing platform using Kubernetes on AWS and on-premise, providing a stable, scalable environment for all applications and services. • Develop and manage our entire infrastructure using Infrastructure-as-Code (IaC) principles with Terraform, ensuring our environments are reproducible, versioned, and automated. • Design, build, and optimize our AI/ML job scheduling and orchestration systems, integrating Slurm with our Kubernetes clusters to efficiently manage GPU resources. • Provision, manage, and maintain our on-premise bare metal server infrastructure for high-performance GPU computing. • Implement and manage the platform's networking (CNI, service mesh) and storage (CSI, S3) solutions to support high-throughput, low-latency workloads across hybrid environments. • Develop a comprehensive observability stack (monitoring, logging, tracing) to ensure platform health, and create automation for operational tasks, incident response, and performance tuning. • Collaborate with AI researchers and ML engineers to understand their infrastructure needs and build the tools and workflows that accelerate their development cycle. • Automate the life cycle of single-tenant, managed deployments • Are passionate about building platforms that empower developers and researchers. • Enjoy creating elegant, automated solutions for complex infrastructure challenges in both cloud and data center environments. • Thrive on optimizing hybrid infrastructure for performance, cost, and reliability. • Are excited to work at the intersection of modern platform engineering and cutting-edge AI. • Love to treat infrastructure as a product, continuously improving the developer experience. • 5+ years of experience in Platform Engineering, DevOps, or Site Reliability Engineering (SRE). • Proven, hands-on experience building and managing production infrastructure with Terraform. • Expert-level knowledge of Kubernetes architecture and operations in a large-scale environment. • Experience with high-performance compute (HPC) job schedulers, specifically Slurm, for managing GPU-intensive AI workloads. • Experience managing bare metal infrastructure, including server provisioning (e.g., PXE boot, MAAS), configuration, and lifecycle management. • Strong scripting and automation skills (e.g., Python, Go, Bash). • Experience with CI/CD systems (e.g., GitLab CI, Jenkins, ArgoCD) and building developer tooling. • Familiarity with FinOps principles and cloud cost optimization strategies. • Knowledge of Kubernetes networking (e.g., Calico, Cilium) and storage (e.g., Ceph, Rook) solutions. • Experience in a multi-region or hybrid cloud environment.

🎯 Requirements

• 5+ years of experience in Platform Engineering, DevOps, or Site Reliability Engineering (SRE). • Proven, hands-on experience building and managing production infrastructure with Terraform. • Expert-level knowledge of Kubernetes architecture and operations in a large-scale environment. • Experience with high-performance compute (HPC) job schedulers, specifically Slurm, for managing GPU-intensive AI workloads. • Experience managing bare metal infrastructure, including server provisioning (e.g., PXE boot, MAAS), configuration, and lifecycle management. • Strong scripting and automation skills (e.g., Python, Go, Bash).

🏖️ Benefits

• Offers Equity • Offers Bonus • 10% Annual Bonus

Apply Now

Similar Jobs

August 15

Owner.com

11 - 50

Own CI/CD across backend, frontend, and mobile at Owner.com. Focus on iOS build pipelines, signing, and fast app delivery.

🇺🇸 United States – Remote

💵 $190k - $210k / year

⏰ Full Time

🟠 Senior

🏗️ Platform Engineer

August 11

Scene Health

51 - 200

⚕️ Healthcare Insurance

☁️ SaaS

🧘 Wellness

Implement automation, CI/CD, and secure infra for Scene Health's healthcare platform. Remote role with regulatory and security focus.

🇺🇸 United States – Remote

⏰ Full Time

🟢 Junior

🟡 Mid-level

🏗️ Platform Engineer

🚫👨‍🎓 No degree required

🗣️🇪🇸 Spanish Required

August 3

Beautiful.ai

11 - 50

☁️ SaaS

⚡ Productivity

🤖 Artificial Intelligence

Senior Platform Engineer at Beautiful.ai responsible for core infrastructure design and mentoring engineers.

🇺🇸 United States – Remote

💵 $160k - $200k / year

💰 $11M Series B on 2018-05

⏰ Full Time

🟠 Senior

🏗️ Platform Engineer

August 3

Railway

11 - 50

☁️ SaaS

As a Senior Platform Engineer at Railway, you'll design scalable infrastructure for storage systems.

🇺🇸 United States – Remote

💰 $20M Series A on 2022-05

⏰ Full Time

🟠 Senior

🏗️ Platform Engineer

🦅 H1B Visa Sponsor

July 29

Deepgram

51 - 200

🤖 Artificial Intelligence

☁️ SaaS

🔌 API

Deepgram is the leading voice AI platform looking for a Platform Engineer to build and operate a hybrid infrastructure.

🇺🇸 United States – Remote

💵 $160k - $220k / year

💰 $47M Series B on 2022-11

⏰ Full Time

🟡 Mid-level

🟠 Senior

🏗️ Platform Engineer

🦅 H1B Visa Sponsor

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com