Platform Engineer – AI/ML Infrastructure

Job not on LinkedIn

July 29

Apply Now
Logo of Deepgram

Deepgram

Artificial Intelligence • SaaS • API

Deepgram is a leading voice AI company that provides powerful APIs for speech-to-text, text-to-speech, and language understanding applications. Their platform enables developers to build sophisticated voice AI solutions for use cases such as contact centers, medical transcription, conversational AI, and more. Known for unmatched accuracy, speed, and cost-effectiveness, Deepgram's technology is trusted by top enterprises and startups worldwide. By offering real-time and highly accurate transcription capabilities, Deepgram helps businesses gain insights from voice data, making it an essential tool for transforming voice interactions.

51 - 200 employees

Founded 2015

🤖 Artificial Intelligence

☁️ SaaS

🔌 API

💰 $47M Series B on 2022-11

📋 Description

• Architect and maintain our core computing platform using Kubernetes on AWS and on-premise, providing a stable, scalable environment for all applications and services. • Develop and manage our entire infrastructure using Infrastructure-as-Code (IaC) principles with Terraform, ensuring our environments are reproducible, versioned, and automated. • Design, build, and optimize our AI/ML job scheduling and orchestration systems, integrating Slurm with our Kubernetes clusters to efficiently manage GPU resources. • Provision, manage, and maintain our on-premise bare metal server infrastructure for high-performance GPU computing. • Implement and manage the platform's networking (CNI, service mesh) and storage (CSI, S3) solutions to support high-throughput, low-latency workloads across hybrid environments. • Develop a comprehensive observability stack (monitoring, logging, tracing) to ensure platform health, and create automation for operational tasks, incident response, and performance tuning. • Collaborate with AI researchers and ML engineers to understand their infrastructure needs and build the tools and workflows that accelerate their development cycle. • Automate the life cycle of single-tenant, managed deployments

🎯 Requirements

• 5+ years of experience in Platform Engineering, DevOps, or Site Reliability Engineering (SRE). • Proven, hands-on experience building and managing production infrastructure with Terraform. • Expert-level knowledge of Kubernetes architecture and operations in a large-scale environment. • Experience with high-performance compute (HPC) job schedulers, specifically Slurm, for managing GPU-intensive AI workloads. • Experience managing bare metal infrastructure, including server provisioning (e.g., PXE boot, MAAS), configuration, and lifecycle management. • Strong scripting and automation skills (e.g., Python, Go, Bash).

🏖️ Benefits

• Offers Equity • Offers Bonus • 10% Annual Bonus

Apply Now

Similar Jobs

July 29

Ishpi Information Technologies, Inc. (DBA ISHPI)

201 - 500

🔒 Cybersecurity

🤝 B2B

🏛️ Government

Support NIWC Atlantic by creating and enhancing IT solutions to improve command functionality.

🇺🇸 United States – Remote

💵 $70k - $90k / year

⏰ Full Time

🟢 Junior

🟡 Mid-level

🏗️ Platform Engineer

July 28

TechStarsGroup LLC

2 - 10

💳 Fintech

⚕️ Healthcare Insurance

☁️ SaaS

Build and maintain cutting-edge natural language understanding platforms for healthcare using GCP and Kubernetes.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

🏗️ Platform Engineer

July 28

Supabase

51 - 200

☁️ SaaS

🔌 API

🤖 Artificial Intelligence

Join Supabase to build and operate cloud infrastructure for database developer tools.

🇺🇸 United States – Remote

💰 $80M Series B on 2022-05

⏰ Full Time

🟡 Mid-level

🟠 Senior

🏗️ Platform Engineer

July 23

Owner.com

11 - 50

We are seeking a Platform Security Engineer to enhance security across Owner’s cloud infrastructure and CI/CD.

🇺🇸 United States – Remote

💵 $190k - $220k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

🏗️ Platform Engineer

July 23

Astra Finance

11 - 50

💳 Fintech

💸 Finance

☁️ SaaS

Senior infrastructure engineer needed for Astra's financial platform managing GCP and building CI/CD systems.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

🏗️ Platform Engineer

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com