Principal DevOps Engineer

Artificial Intelligence • Hardware • Enterprise

SambaNova Systems is a technology company focused on advancing artificial intelligence and deep learning. They offer an enterprise-grade AI platform that is purpose-built for generative AI, enabling organizations across various sectors to rapidly deploy state-of-the-art AI capabilities. SambaNova's platform integrates from hardware chips to AI models, providing powerful solutions for high-performance computing and AI workloads. Their technology is utilized in fields such as science, public sector applications, and the development of sovereign AI solutions. The company's innovations include alternatives to GPUs, high-speed AI inference, and scalable AI hardware systems.

201 - 500 employees

Founded 2017

🤖 Artificial Intelligence

🔧 Hardware

🏢 Enterprise

Principal DevOps Engineer

November 22

🏄 California – Remote

🤠 Texas – Remote

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

AWS

Docker

Jenkins

Kubernetes

Linux

Python

Unix

Apply Now

SambaNova Systems

Artificial Intelligence • Hardware • Enterprise

201 - 500 employees

Founded 2017

🤖 Artificial Intelligence

🔧 Hardware

🏢 Enterprise

📋 Description

• Take ownership of our existing Bazel ecosystem, including RBE setup, maintenance, and troubleshooting. • Ensure the stability, scalability, and performance of our CI/CD pipelines. • Collaborate with development teams to optimize build and test processes. • Maintain and improve our CircleCI setup, including workflow optimization and configuration management. • Manage Python package dependencies and ensure seamless integration with our CI/CD pipelines. • Work with the development team to implement best practices for package management and dependency management. • Familiarize yourself with our GAR and JFrog Artifact Management setup and optimize its usage. • Collaborate with the engineering team to implement infrastructure changes and improvements.

🎯 Requirements

• 2+ years of experience in DevOps or Infra. • Experience in managing dependencies in large scale projects. • Experience with Python Package Management and RPM packages. • Experience with Google Artifact Registry (GAR) and/or JFrog Artifact Management. • Experience with Linux/Unix systems and command-line interfaces. • Strong scripting skills (e.g., Python, Bash, etc.). • Excellent problem-solving skills and attention to detail. • Ability to work collaboratively with cross-functional teams. • Experience maintaining and troubleshooting Bazel ecosystems, especially in C++ and Python. • Familiarity with containerization (e.g., Docker) and orchestration (e.g., Kubernetes). • Familiarity with AWS/GCloud. • Experience with other CI/CD tools (e.g., Jenkins, GitLab CI/CD, etc.) preferably CircleCI and Jenkins. • Knowledge of software development best practices and coding standards.

🏖️ Benefits

• 95% premium coverage for employee medical insurance • 77% premium coverage for dependents • Health Savings Account (HSA) with employer contribution • Dental insurance • Vision insurance • Short/Long term Disability insurance • Basic Life insurance • Voluntary Life insurance • AD&D insurance plans • Flexible Spending Account (FSA) options including Health Care, Limited Purpose, and Dependent Care • Subscription to Headspace • Gympass+ membership with access to physical gyms • One Medical membership • Counseling services with an Employee Assistance Program • Well-being benefits available to you and your dependents

Apply Now

Similar Jobs

Staff Site Reliability Engineer, Streaming

November 21

Alpaca

201 - 500

🔌 API

💳 Fintech

₿ Crypto

Site Reliability Engineer ensuring the reliability and performance of systems at Alpaca. Collaborate with teams to implement solutions and improve the infrastructure.

🇺🇸 United States – Remote

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

Distributed Systems

Kafka

Kubernetes

Linux

Prometheus

RabbitMQ

Global Head of SRE

November 20

Socure

501 - 1000

🤖 Artificial Intelligence

🔐 Security

💸 Finance

Global Head of Site Reliability Engineering at Socure, leading end-to-end reliability for identity verification platform. Focused on high-impact systems and advanced engineering practices.

🇺🇸 United States – Remote

💵 $260k - $285k / year

💰 $450M Series E on 2021-11

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

AWS

Cloud

Staff Site Reliability Engineer

November 19

Stord

1001 - 5000

☁️ SaaS

🚗 Transport

🛍️ eCommerce

Staff Site Reliability Engineer at Stord responsible for infrastructure management and production system reliability. Focusing on GCP, automation, and mentoring within a dynamic team.

🇺🇸 United States – Remote

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

Ansible

Chef

Cloud

Distributed Systems

Docker

Google Cloud Platform

Grafana

Java

Jenkins

Kubernetes

Prometheus

Puppet

Python

Terraform

Staff DevOps Engineer

November 18

Cleerly

201 - 500

⚕️ Healthcare Insurance

🤖 Artificial Intelligence

🧬 Biotechnology

Staff Cloud DevOps Engineer for Cleerly, leading cloud infrastructure and enhancing systems for AI-powered diagnostics. Focused on continuous integration, software delivery, and mentoring junior engineers.

🇺🇸 United States – Remote

💵 $207k - $235k / year

💰 Series C on 2022-07

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

AWS

Cloud

DynamoDB

EC2

JavaScript

Kubernetes

Linux

Node.js

Python

Terraform

Staff Software Engineer – SAP BTP CPI SRE

November 14

NBCUniversal

10,000+ employees

📱 Media

Staff Software Engineer overseeing operational support of SAP BTP CPI applications at NBCUniversal. Leading offshore teams and collaborating on production deployments.

🇺🇸 United States – Remote

💵 $140k - $180k / year

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor