Systems Architect – AI/ML Infrastructure

🕒 April 6

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Deepgram

Deepgram

51 - 200 employees

Founded 2015

🤖 Artificial Intelligence

☁️ SaaS

🔌 API

💰 $47M Series B on 2022-11

Artificial Intelligence • SaaS • API

Deepgram is a leading voice AI company that provides powerful APIs for speech-to-text, text-to-speech, and language understanding applications. Their platform enables developers to build sophisticated voice AI solutions for use cases such as contact centers, medical transcription, conversational AI, and more. Known for unmatched accuracy, speed, and cost-effectiveness, Deepgram's technology is trusted by top enterprises and startups worldwide. By offering real-time and highly accurate transcription capabilities, Deepgram helps businesses gain insights from voice data, making it an essential tool for transforming voice interactions.

📋 Description

• Define and drive the end-to-end infrastructure architecture for Deepgram's AI/ML workloads across production inference and research training • Design multi-cloud and hybrid infrastructure strategies that balance performance, reliability, cost, and vendor flexibility • Architect compute orchestration systems that efficiently schedule and manage GPU and CPU workloads across heterogeneous infrastructure • Design storage architectures that handle the massive datasets required for speech and audio ML -- from high-throughput training data pipelines to low-latency model serving • Lead capacity planning across all infrastructure dimensions, modeling growth and ensuring Deepgram can scale ahead of demand • Drive cost optimization and FinOps practices, identifying opportunities to reduce infrastructure spend without compromising performance or reliability • Design burstable, elastic training infrastructure that can scale up for large training runs and scale down to minimize idle cost • Architect research compute infrastructure that gives ML teams the resources they need while maintaining operational efficiency • Establish architectural standards, design review processes, and technical documentation practices for infrastructure decisions • Collaborate with engineering leadership to align infrastructure strategy with product roadmap and business objectives • Evaluate emerging hardware, cloud services, and infrastructure technologies for potential adoption

🎯 Requirements

• 7+ years of experience in infrastructure engineering, systems architecture, or a senior technical role focused on large-scale infrastructure • Proven experience designing multi-cloud architectures spanning AWS and at least one other major cloud provider or on-premises environment • Deep expertise in storage system design -- block, object, and file storage, including performance tuning for large-scale data workloads • Strong experience with compute orchestration using Kubernetes, and an understanding of how to schedule diverse workloads efficiently • Hands-on experience with GPU infrastructure -- procurement considerations, cluster design, driver and runtime management • Track record of capacity planning and infrastructure scaling for high-growth environments • Ability to communicate complex architectural decisions clearly to both technical and non-technical stakeholders • Strong understanding of networking fundamentals as they relate to infrastructure architecture

🏖️ Benefits

• Medical, dental, vision benefits • Annual wellness stipend • Mental health support • Life, STD, LTD Income Insurance Plans • Unlimited PTO • Generous paid parental leave • Flexible schedule • 12 Paid US company holidays • Quarterly personal productivity stipend • One-time stipend for home office upgrades • 401(k) plan with company match • Tax Savings Programs

Apply Now

Similar Jobs

🕒 April 4

Torus

1001 - 5000

🏠 Real Estate

🤝 Non-profit

Staff Embedded Systems Engineer developing production-grade firmware for energy storage systems. Collaborating with a team to support hardware and software integration in a rapidly growing company.

Cloud

Linux

Python

🕒 April 4

SheerID

201 - 500

🤝 B2B

🛍️ eCommerce

🔐 Security

Design, build, and automate the technology ecosystem that powers SheerID’s GTM teams. Partner with Revenue Operations and GTM leaders for system and process alignment.

🕒 April 4

Hewlett Packard Enterprise

10,000+ employees

🏢 Enterprise

🔧 Hardware

☁️ SaaS

Pre-Sales Channel Engineer supporting HPE Networking practice through collaboration with partners. Driving adoption of HPE’s advanced networking portfolio with strategic technical sales leadership.

Cloud

Firewalls

Switching

🕒 April 3

Hewlett Packard Enterprise

10,000+ employees

🏢 Enterprise

🔧 Hardware

☁️ SaaS

Pre-Sales Channel Engineer supporting the growth of HPE Networking practice and partner enablement through technical sales leadership. Collaborating with partners to deliver modern networking solutions.

Cloud

Firewalls

Switching

🕒 April 3

Palo Alto Networks

10,000+ employees

🔒 Cybersecurity

🏢 Enterprise

Solutions Architect defining and building innovative network security offerings at Palo Alto Networks. Collaborating across product, engineering, and GTM to drive customer outcomes and revenue growth.

AWS

Azure

Cloud

Firewalls

Google Cloud Platform

Switching