Dev Ops / Infrastructure Engineer

August 22

Apply Now
Logo of 10a Labs

10a Labs

Artificial Intelligence • Cybersecurity • SaaS

10a Labs is an applied research and technology company specializing in AI security. It delivers intelligence collection, investigative research, and analysis for AI unicorns, Fortune 10 companies, and U. S. tech leaders. The company provides services that empower security teams and technology leaders to stay ahead of evolving threats, drive innovation, and protect their brands. Their work includes stress-testing AI models against threats, developing real-time threat detection systems, and monitoring cyber threats to enhance AI safety and security throughout the product development lifecycle.

📋 Description

• Design, build, and document a maintainable GCP cloud infrastructure CI/CD pipeline for real-time model serving and data workflows • Deploy and optimize APIs for low-latency ML systems • Automate model deployment, retraining, and evaluation (CI/CD for ML) • Build observability tooling to monitor rollouts, errors, integration testing, and drift in ML pipelines • Ensure infrastructure meets security, compliance, and uptime requirements • Has 3–8 years of DevOps/Platform engineering experience deploying machine learning systems or high-availability backend systems. • Ability to build CI/CD pipelines from scratch; familiarity with GitHub Actions or similar. • Expert-level proficiency with Git and GitHub workflows and strong scripting abilities in Python, Bash, and/or Go. • Experience with Google Cloud Run and Docker. Experience with Google Cloud Platforms, Docker, Kubernetes, Terraform . • Familiarity with SOC 2 compliance requirements and security best practices (IAM, secrets, etc). • Experience implementing monitoring, logging, and alerting systems (e.g., Prometheius, Grafana, ELK/EFK, OpenTelemetry). • Can work cross-functionally with ML, security, and engineering teams to deploy safely and iterate fast. • Brings a builder's mindset and bias for ownership in ambiguous environments. • You’ve deployed and monitored a real-time ML inference system with well-defined observability. • You’ve implemented an API with latency under 1000ms for classifier-based inference. • You’ve partnered with ML engineers to streamline deployment and retraining workflows. • You’ve built logging and monitoring that gives insight into system performance and classifier behavior.

🎯 Requirements

• Has 3–8 years of DevOps/Platform engineering experience deploying machine learning systems or high-availability backend systems. • Ability to build CI/CD pipelines from scratch; familiarity with GitHub Actions or similar. • Expert-level proficiency with Git and GitHub workflows and strong scripting abilities in Python, Bash, and/or Go. • Experience with Google Cloud Run and Docker. Experience with Google Cloud Platforms, Docker, Kubernetes, Terraform . • Familiarity with SOC 2 compliance requirements and security best practices (IAM, secrets, etc). • Experience implementing monitoring, logging, and alerting systems (e.g., Prometheius, Grafana, ELK/EFK, OpenTelemetry). • Can work cross-functionally with ML, security, and engineering teams to deploy safely and iterate fast. • Brings a builder's mindset and bias for ownership in ambiguous environments.

Apply Now

Similar Jobs

August 17

Castillians

51 - 200

GCP DevOps Engineer designing and optimizing cloud infrastructure for clients. Collaborating in a global engineering network on project-based engagements.

Ansible

BigQuery

Cloud

DNS

Firewalls

Google Cloud Platform

Grafana

Jenkins

Kubernetes

Prometheus

Puppet

Python

Terraform

Go

August 17

Castillians

51 - 200

AWS DevOps Engineer designing and optimizing cloud infrastructure on AWS. Building continuous integration and deployment pipelines in a remote role.

Ansible

AWS

Chef

Cloud

Docker

EC2

Grafana

Jenkins

Kubernetes

Linux

Prometheus

Puppet

Python

Ray

Terraform

August 17

Castillians

51 - 200

Azure DevOps Engineer designing, deploying, and managing cloud infrastructure on Microsoft Azure. Responsible for automation, optimization, and security best practices in a remote project-based assignment.

Azure

Cloud

Docker

Kubernetes

Python

Terraform

July 29

As a member of the Software Defined Networking team, manage network infrastructure through code.

Java

JavaScript

Jenkins

Python

Switching

Terraform

Go

June 27

Join NorthBay as a Lead Azure DevOps/MLOps Engineer to implement cloud solutions and MLOps practices.

AWS

Azure

Cloud

Docker

Google Cloud Platform

Java

Jenkins

Kubernetes

Microservices

Python

Terraform

Web3

.NET

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com