Dev Ops / Infrastructure Engineer

August 22

Apply Now
Logo of 10a Labs

10a Labs

Artificial Intelligence • Cybersecurity • SaaS

10a Labs is an applied research and technology company specializing in AI security. It delivers intelligence collection, investigative research, and analysis for AI unicorns, Fortune 10 companies, and U. S. tech leaders. The company provides services that empower security teams and technology leaders to stay ahead of evolving threats, drive innovation, and protect their brands. Their work includes stress-testing AI models against threats, developing real-time threat detection systems, and monitoring cyber threats to enhance AI safety and security throughout the product development lifecycle.

11 - 50 employees

Founded 2021

🤖 Artificial Intelligence

🔒 Cybersecurity

☁️ SaaS

📋 Description

• Design, build, and document a maintainable GCP cloud infrastructure CI/CD pipeline for real-time model serving and data workflows • Deploy and optimize APIs for low-latency ML systems • Automate model deployment, retraining, and evaluation (CI/CD for ML) • Build observability tooling to monitor rollouts, errors, integration testing, and drift in ML pipelines • Ensure infrastructure meets security, compliance, and uptime requirements • Has 3–8 years of DevOps/Platform engineering experience deploying machine learning systems or high-availability backend systems. • Ability to build CI/CD pipelines from scratch; familiarity with GitHub Actions or similar. • Expert-level proficiency with Git and GitHub workflows and strong scripting abilities in Python, Bash, and/or Go. • Experience with Google Cloud Run and Docker. Experience with Google Cloud Platforms, Docker, Kubernetes, Terraform . • Familiarity with SOC 2 compliance requirements and security best practices (IAM, secrets, etc). • Experience implementing monitoring, logging, and alerting systems (e.g., Prometheius, Grafana, ELK/EFK, OpenTelemetry). • Can work cross-functionally with ML, security, and engineering teams to deploy safely and iterate fast. • Brings a builder's mindset and bias for ownership in ambiguous environments. • You’ve deployed and monitored a real-time ML inference system with well-defined observability. • You’ve implemented an API with latency under 1000ms for classifier-based inference. • You’ve partnered with ML engineers to streamline deployment and retraining workflows. • You’ve built logging and monitoring that gives insight into system performance and classifier behavior.

🎯 Requirements

• Has 3–8 years of DevOps/Platform engineering experience deploying machine learning systems or high-availability backend systems. • Ability to build CI/CD pipelines from scratch; familiarity with GitHub Actions or similar. • Expert-level proficiency with Git and GitHub workflows and strong scripting abilities in Python, Bash, and/or Go. • Experience with Google Cloud Run and Docker. Experience with Google Cloud Platforms, Docker, Kubernetes, Terraform . • Familiarity with SOC 2 compliance requirements and security best practices (IAM, secrets, etc). • Experience implementing monitoring, logging, and alerting systems (e.g., Prometheius, Grafana, ELK/EFK, OpenTelemetry). • Can work cross-functionally with ML, security, and engineering teams to deploy safely and iterate fast. • Brings a builder's mindset and bias for ownership in ambiguous environments.

Apply Now

Similar Jobs

August 17

Castillians

51 - 200

GCP DevOps Engineer designing and optimizing cloud infrastructure for clients. Collaborating in a global engineering network on project-based engagements.

🇺🇸 United States – Remote

⏳ Contract/Temporary

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

August 17

Castillians

51 - 200

AWS DevOps Engineer designing and optimizing cloud infrastructure on AWS. Building continuous integration and deployment pipelines in a remote role.

🇺🇸 United States – Remote

⏳ Contract/Temporary

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

August 17

Castillians

51 - 200

Azure DevOps Engineer designing, deploying, and managing cloud infrastructure on Microsoft Azure. Responsible for automation, optimization, and security best practices in a remote project-based assignment.

🇺🇸 United States – Remote

⏳ Contract/Temporary

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

July 29

nanoSoft Consulting

11 - 50

🤝 B2B

☁️ SaaS

As a member of the Software Defined Networking team, manage network infrastructure through code.

🇺🇸 United States – Remote

⏳ Contract/Temporary

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

June 27

NorthBay Solutions

201 - 500

🤖 Artificial Intelligence

☁️ SaaS

Join NorthBay as a Lead Azure DevOps/MLOps Engineer to implement cloud solutions and MLOps practices.

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com