Senior Deployment Engineer, AI Inference

Artificial Intelligence • Hardware • Healthcare Insurance

Cerebras Systems is a pioneering company that focuses on developing advanced AI hardware, specifically the Cerebras Wafer Scale Engine, which delivers unparalleled performance in AI inference, outperforming traditional GPU setups. Their cutting-edge technology enables organizations like Mayo Clinic and AlphaSense to run state-of-the-art AI models with remarkable speed and efficiency. With flexible deployment options including cloud and on-premises solutions, Cerebras is transforming the landscape of AI capabilities for innovative teams across various industries.

201 - 500 employees

Founded 2016

🤖 Artificial Intelligence

🔧 Hardware

⚕️ Healthcare Insurance

Senior Deployment Engineer, AI Inference

Job not on LinkedIn

October 14

🇨🇦 Canada – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Docker

Grafana

Kubernetes

Linux

Prometheus

Python

Apply Now

Cerebras Systems

Artificial Intelligence • Hardware • Healthcare Insurance

201 - 500 employees

Founded 2016

🤖 Artificial Intelligence

🔧 Hardware

⚕️ Healthcare Insurance

📋 Description

• Deploy AI inference replicas and cluster software across multiple datacenters. • Operate across heterogeneous datacenter environments undergoing rapid 10x growth. • Maximize capacity allocation and optimize replica placement using constraint-solver algorithms. • Operate bare-metal inference infrastructure while supporting transition to K8S-based platform. • Develop and extend telemetry, observability and alerting solutions to ensure deployment reliability at scale. • Develop and extend a fully automated deployment pipeline to support fast software updates and capacity reallocation at scale. • Translate technical and customer needs into actionable requirements for the Dev Infra, Cluster, Platform and Core teams. • Stay up to date with the latest advancements in AI compute infrastructure and related technologies.

🎯 Requirements

• 5-7 years of experience in operating on-prem compute infrastructure (ideally in Machine Learning or High-Performance Compute) or developing and managing complex AWS plane infrastructure for hybrid deployments. • Strong proficiency in Python for automation, orchestration, and deployment tooling. • Solid understanding of Linux-based systems and command-line tools. • Extensive knowledge of Docker containers and container orchestration platforms like K8S. • Familiarity with spine-leaf (Clos) networking architecture. • Proficiency with telemetry and observability stacks such as Prometheus, InfluxDB and Grafana. • Strong ownership mindset and accountability for complex deployments. • Ability to work effectively in a fast-paced environment.

🏖️ Benefits

• Build a breakthrough AI platform beyond the constraints of the GPU. • Publish and open source their cutting-edge AI research. • Work on one of the fastest AI supercomputers in the world. • Enjoy job stability with startup vitality. • Our simple, non-corporate work culture that respects individual beliefs.

Apply Now

Similar Jobs

Senior Site Reliability Engineer

October 9

Masabi

201 - 500

🚗 Transport

☁️ SaaS

Senior Site Reliability Engineer managing infrastructure and improving reliability at Masabi. Leading systems design and development, focusing on automation and performance.

🇨🇦 Canada – Remote

💰 Venture Round on 2022-03

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

Terraform

Senior Software Engineer – SRE

October 7

Veeva Systems

1001 - 5000

☁️ SaaS

⚕️ Healthcare Insurance

💊 Pharmaceuticals

Senior Site Reliability Engineer ensuring scalability and reliability of enterprise applications at Veeva. Tackling complex challenges globally with deep Java expertise and open-source technologies.

🇨🇦 Canada – Remote

💵 $110k - $270k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Ansible

AWS

Cloud

Docker

Hibernate

Java

Kubernetes

Linux

Maven

MySQL

Open Source

Python

Ruby

Spring

SQL

Vagrant

Senior Software Engineer – SRE

October 7

Veeva Systems

1001 - 5000

☁️ SaaS

⚕️ Healthcare Insurance

💊 Pharmaceuticals

Senior Site Reliability Engineer at Veeva, helping life sciences companies optimize their cloud infrastructure. Ensuring scalability and reliability for enterprise applications across multiple regions.

🇨🇦 Canada – Remote

💵 $110k - $270k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Ansible

AWS

Cloud

Docker

Hibernate

Java

Kubernetes

Linux

Maven

MySQL

Open Source

Python

Ruby

Spring

SQL

Vagrant

Senior Software Engineer – SRE

October 7

Veeva Systems

1001 - 5000

☁️ SaaS

⚕️ Healthcare Insurance

💊 Pharmaceuticals

Senior Site Reliability Engineer on Vault Platform ensuring scalability and reliability of enterprise applications at Veeva. Tackling complex challenges leveraging Java and open-source technologies for global customers.

🇨🇦 Canada – Remote

💵 $110k - $270k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Ansible

AWS

Cloud

Docker

Hibernate

Java

Kubernetes

Linux

Maven

MySQL

Open Source

Python

Ruby

Spring

SQL

Vagrant

Senior Deployment Engineer – CAD

October 7

Atolio

11 - 50

🤖 Artificial Intelligence

🏢 Enterprise

☁️ SaaS

Deployment Engineer working with engineering and client success teams at Atolio. Ensure efficient deployment of enterprise search platform in various environments.

🇨🇦 Canada – Remote

💵 CA$150k - CA$200k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Azure

Cloud

Google Cloud Platform

Grafana

Kubernetes

Python

ServiceNow

Splunk

Terraform