Senior Deployment Engineer, AI Inference

Job not on LinkedIn

October 14

Apply Now
Logo of Cerebras Systems

Cerebras Systems

Artificial Intelligence ‱ Hardware ‱ Healthcare Insurance

Cerebras Systems is a pioneering company that focuses on developing advanced AI hardware, specifically the Cerebras Wafer Scale Engine, which delivers unparalleled performance in AI inference, outperforming traditional GPU setups. Their cutting-edge technology enables organizations like Mayo Clinic and AlphaSense to run state-of-the-art AI models with remarkable speed and efficiency. With flexible deployment options including cloud and on-premises solutions, Cerebras is transforming the landscape of AI capabilities for innovative teams across various industries.

201 - 500 employees

Founded 2016

đŸ€– Artificial Intelligence

🔧 Hardware

⚕ Healthcare Insurance

📋 Description

‱ Deploy AI inference replicas and cluster software across multiple datacenters. ‱ Operate across heterogeneous datacenter environments undergoing rapid 10x growth. ‱ Maximize capacity allocation and optimize replica placement using constraint-solver algorithms. ‱ Operate bare-metal inference infrastructure while supporting transition to K8S-based platform. ‱ Develop and extend telemetry, observability and alerting solutions to ensure deployment reliability at scale. ‱ Develop and extend a fully automated deployment pipeline to support fast software updates and capacity reallocation at scale. ‱ Translate technical and customer needs into actionable requirements for the Dev Infra, Cluster, Platform and Core teams. ‱ Stay up to date with the latest advancements in AI compute infrastructure and related technologies.

🎯 Requirements

‱ 5-7 years of experience in operating on-prem compute infrastructure (ideally in Machine Learning or High-Performance Compute) or developing and managing complex AWS plane infrastructure for hybrid deployments. ‱ Strong proficiency in Python for automation, orchestration, and deployment tooling. ‱ Solid understanding of Linux-based systems and command-line tools. ‱ Extensive knowledge of Docker containers and container orchestration platforms like K8S. ‱ Familiarity with spine-leaf (Clos) networking architecture. ‱ Proficiency with telemetry and observability stacks such as Prometheus, InfluxDB and Grafana. ‱ Strong ownership mindset and accountability for complex deployments. ‱ Ability to work effectively in a fast-paced environment.

đŸ–ïž Benefits

‱ Build a breakthrough AI platform beyond the constraints of the GPU. ‱ Publish and open source their cutting-edge AI research. ‱ Work on one of the fastest AI supercomputers in the world. ‱ Enjoy job stability with startup vitality. ‱ Our simple, non-corporate work culture that respects individual beliefs.

Apply Now

Similar Jobs

October 9

Masabi

201 - 500

🚗 Transport

☁ SaaS

Senior Site Reliability Engineer managing infrastructure and improving reliability at Masabi. Leading systems design and development, focusing on automation and performance.

October 7

Veeva Systems

1001 - 5000

☁ SaaS

⚕ Healthcare Insurance

💊 Pharmaceuticals

Senior Site Reliability Engineer ensuring scalability and reliability of enterprise applications at Veeva. Tackling complex challenges globally with deep Java expertise and open-source technologies.

October 7

Veeva Systems

1001 - 5000

☁ SaaS

⚕ Healthcare Insurance

💊 Pharmaceuticals

Senior Site Reliability Engineer at Veeva, helping life sciences companies optimize their cloud infrastructure. Ensuring scalability and reliability for enterprise applications across multiple regions.

October 7

Veeva Systems

1001 - 5000

☁ SaaS

⚕ Healthcare Insurance

💊 Pharmaceuticals

Senior Site Reliability Engineer on Vault Platform ensuring scalability and reliability of enterprise applications at Veeva. Tackling complex challenges leveraging Java and open-source technologies for global customers.

October 7

Atolio

11 - 50

đŸ€– Artificial Intelligence

🏱 Enterprise

☁ SaaS

Deployment Engineer working with engineering and client success teams at Atolio. Ensure efficient deployment of enterprise search platform in various environments.

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com