
Artificial Intelligence ⢠Hardware ⢠Enterprise
SambaNova Systems is a technology company focused on advancing artificial intelligence and deep learning. They offer an enterprise-grade AI platform that is purpose-built for generative AI, enabling organizations across various sectors to rapidly deploy state-of-the-art AI capabilities. SambaNova's platform integrates from hardware chips to AI models, providing powerful solutions for high-performance computing and AI workloads. Their technology is utilized in fields such as science, public sector applications, and the development of sovereign AI solutions. The company's innovations include alternatives to GPUs, high-speed AI inference, and scalable AI hardware systems.
November 22
đ California â Remote
đ¤ Texas â Remote
â° Full Time
đ´ Lead
â DevOps & Site Reliability Engineer (SRE)
đŚ H1B Visa Sponsor

Artificial Intelligence ⢠Hardware ⢠Enterprise
SambaNova Systems is a technology company focused on advancing artificial intelligence and deep learning. They offer an enterprise-grade AI platform that is purpose-built for generative AI, enabling organizations across various sectors to rapidly deploy state-of-the-art AI capabilities. SambaNova's platform integrates from hardware chips to AI models, providing powerful solutions for high-performance computing and AI workloads. Their technology is utilized in fields such as science, public sector applications, and the development of sovereign AI solutions. The company's innovations include alternatives to GPUs, high-speed AI inference, and scalable AI hardware systems.
⢠Take ownership of our existing Bazel ecosystem, including RBE setup, maintenance, and troubleshooting. ⢠Ensure the stability, scalability, and performance of our CI/CD pipelines. ⢠Collaborate with development teams to optimize build and test processes. ⢠Maintain and improve our CircleCI setup, including workflow optimization and configuration management. ⢠Manage Python package dependencies and ensure seamless integration with our CI/CD pipelines. ⢠Work with the development team to implement best practices for package management and dependency management. ⢠Familiarize yourself with our GAR and JFrog Artifact Management setup and optimize its usage. ⢠Collaborate with the engineering team to implement infrastructure changes and improvements.
⢠2+ years of experience in DevOps or Infra. ⢠Experience in managing dependencies in large scale projects. ⢠Experience with Python Package Management and RPM packages. ⢠Experience with Google Artifact Registry (GAR) and/or JFrog Artifact Management. ⢠Experience with Linux/Unix systems and command-line interfaces. ⢠Strong scripting skills (e.g., Python, Bash, etc.). ⢠Excellent problem-solving skills and attention to detail. ⢠Ability to work collaboratively with cross-functional teams. ⢠Experience maintaining and troubleshooting Bazel ecosystems, especially in C++ and Python. ⢠Familiarity with containerization (e.g., Docker) and orchestration (e.g., Kubernetes). ⢠Familiarity with AWS/GCloud. ⢠Experience with other CI/CD tools (e.g., Jenkins, GitLab CI/CD, etc.) preferably CircleCI and Jenkins. ⢠Knowledge of software development best practices and coding standards.
⢠95% premium coverage for employee medical insurance ⢠77% premium coverage for dependents ⢠Health Savings Account (HSA) with employer contribution ⢠Dental insurance ⢠Vision insurance ⢠Short/Long term Disability insurance ⢠Basic Life insurance ⢠Voluntary Life insurance ⢠AD&D insurance plans ⢠Flexible Spending Account (FSA) options including Health Care, Limited Purpose, and Dependent Care ⢠Subscription to Headspace ⢠Gympass+ membership with access to physical gyms ⢠One Medical membership ⢠Counseling services with an Employee Assistance Program ⢠Well-being benefits available to you and your dependents
Apply NowNovember 21
Site Reliability Engineer ensuring the reliability and performance of systems at Alpaca. Collaborate with teams to implement solutions and improve the infrastructure.
Distributed Systems
Kafka
Kubernetes
Linux
Prometheus
RabbitMQ
Go
November 20
Global Head of Site Reliability Engineering at Socure, leading end-to-end reliability for identity verification platform. Focused on high-impact systems and advanced engineering practices.
đşđ¸ United States â Remote
đľ $260k - $285k / year
đ° $450M Series E on 2021-11
â° Full Time
đ´ Lead
â DevOps & Site Reliability Engineer (SRE)
đŚ H1B Visa Sponsor
AWS
Cloud
November 19
Staff Site Reliability Engineer at Stord responsible for infrastructure management and production system reliability. Focusing on GCP, automation, and mentoring within a dynamic team.
Ansible
Chef
Cloud
Distributed Systems
Docker
Google Cloud Platform
Grafana
Java
Jenkins
Kubernetes
Prometheus
Puppet
Python
Terraform
Go
November 18
Staff Cloud DevOps Engineer for Cleerly, leading cloud infrastructure and enhancing systems for AI-powered diagnostics. Focused on continuous integration, software delivery, and mentoring junior engineers.
đşđ¸ United States â Remote
đľ $207k - $235k / year
đ° Series C on 2022-07
â° Full Time
đ´ Lead
â DevOps & Site Reliability Engineer (SRE)
đŚ H1B Visa Sponsor
AWS
Cloud
DynamoDB
EC2
JavaScript
Kubernetes
Linux
Node.js
Python
Terraform
November 14
Staff Software Engineer overseeing operational support of SAP BTP CPI applications at NBCUniversal. Leading offshore teams and collaborating on production deployments.