Principal MLOps Engineer

🔥 1 minute ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Raft

Raft

51 - 200 employees

🤖 Artificial Intelligence

🏛️ Government

☁️ SaaS

Artificial Intelligence • Government • SaaS

Raft is a company that partners with public agencies to solve complex problems affecting the lives of millions of Americans. Specializing in cutting-edge digital solutions, Raft focuses on data and AI, digital platforms at scale, and complex software applications. The company emphasizes software architecture, UX/UI design, and automated testing to modernize legacy applications and data systems for speed, security, and scalability. Raft also implements sustainable data governance strategies and human-centered AI systems to enhance decision-making. As a government and commercial partner in advancing technology solutions, Raft is dedicated to empowering organizations with products that prioritize user outcomes over features.

📋 Description

• Design, build, and maintain secure, scalable MLOps infrastructure and deployment pipelines for production ML systems • Help mature Raft’s internal ML platform and model lifecycle capabilities, including model packaging, registry/catalog workflows, deployment, monitoring, and operational support • Deploy and manage machine learning workloads on Kubernetes, including GPU-enabled clusters • Support model serving and inference infrastructure for a range of ML use cases, including traditional ML, computer vision, speech/audio, and LLM-based systems • Build and maintain CI/CD workflows for ML services, model artifacts, and platform components • Partner closely with ML engineers, software engineers, and product teams to move models from experimentation to reliable operational deployment • Improve observability, reliability, security, and maintainability across ML infrastructure and services • Help evaluate and standardize runtime patterns, serving frameworks, and deployment architectures for production ML workloads • Contribute to infrastructure decisions across edge, on-prem, and cloud-hosted deployment environments • Support compliance-driven deployment practices and secure software supply chain requirements in defense environments • Get hands-on with customers at the most forward-leaning places in the Department of War

🎯 Requirements

• 7+ years of relevant hands-on experience in software engineering, platform engineering, DevOps, MLOps, or related technical roles • 5+ years of experience with Docker and Kubernetes in production environments • 5+ years of experience supporting enterprise cloud infrastructure or applications in AWS, Azure, or similar environments • Strong experience provisioning, operating, and troubleshooting Kubernetes clusters in production • Experience building and maintaining machine learning platforms, infrastructure, or pipelines used by engineering or data science teams • Practical experience deploying machine learning workloads on Kubernetes • Experience managing clusters or workloads that use GPUs • Strong understanding of Helm and Kubernetes deployment patterns • Strong scripting or programming skills, preferably in Python • Experience with modern software engineering practices including Git, CI/CD, DevOps, and Agile/Scrum workflows • Strong troubleshooting, systems thinking, and communication skills • Ability to work independently and collaboratively in a fast-moving environment • Ability to obtain and maintain a Top Secret clearance • Ability to obtain Security+ certification within the first 90 days of employment.

🏖️ Benefits

• Highly competitive salary • Fully covered healthcare, dental, and vision coverage • 401(k) and company match • Take as you need PTO + 11 paid holidays • Education & training benefits • Annual budget for your tech/gadgets needs • Monthly box of yummy snacks to eat while doing meaningful work • Remote, hybrid, and flexible work options • Team off-site in fun places! • Generous Referral Bonuses • And More!

Apply Now

Similar Jobs

🕒 Yesterday

ExtraHop

501 - 1000

🔒 Cybersecurity

🔐 Security

🏢 Enterprise

Engineering Manager leading the Machine Learning Infrastructure team at ExtraHop. Delivering and managing cloud-based machine learning infrastructure for cybersecurity solutions.

🇺🇸 United States – Remote

💵 $170k - $195k / year

💰 $41M Series C on 2014-05

⏰ Full Time

🟠 Senior

🔴 Lead

🤖 Machine Learning Engineer

Cloud

SDLC

SQL

🕒 4 days ago

Circana

5001 - 10000

Senior Big Data Engineer responsible for designing and delivering ETL solutions for Circana. Collaborating with teams and leveraging modern big data frameworks and cloud platforms.

Azure

Cloud

ETL

Hadoop

HDFS

MapReduce

Oracle

Postgres

PySpark

Python

Scala

Spark

SQL

🕒 4 days ago

Armis

201 - 500

🔒 Cybersecurity

🏛️ Government

AI Pipeline Engineer creating and optimizing AI infrastructure for a cybersecurity firm. Collaborating with data scientists and engineers to implement advanced AI solutions.

AWS

Azure

Cloud

Docker

Google Cloud Platform

Kubernetes

Python

🕒 5 days ago

iHerb, LLC

1001 - 5000

🛍️ eCommerce

🧘 Wellness

🛒 Retail

Machine Learning Engineer creating scalable machine learning systems impacting millions of users at iHerb. Working closely with data scientists and software developers to enhance customer experience.

AWS

BigQuery

Cloud

Docker

Hadoop

Java

Kafka

Kubernetes

Python

RabbitMQ

Spark

🕒 5 days ago

BlackSky

201 - 500

🤖 Artificial Intelligence

🔐 Security

🏢 Enterprise

Staff Software Engineer focusing on geospatial MLOps solutions for BlackSky's AI/ML Platform. Collaborating with internal teams to deliver production-ready software solutions for satellite imagery analytics.

AWS

Cloud

EC2

ETL

Kubernetes

PostGIS

Postgres

Python

SQL

Go