Senior MLOps / AI Infrastructure Engineer

Job not on LinkedIn

August 8

Apply Now
Logo of SmartRecruitment.com - Smart Recruitment

SmartRecruitment.com - Smart Recruitment

Recruitment • Gaming • Crypto

SmartRecruitment. com is a recruitment platform specializing in the iGaming and cryptocurrency sectors. They provide a range of job listings and recruitment services for professionals looking to enter or advance in these dynamic industries. The platform focuses on connecting candidates with leading companies in the iGaming and crypto markets, offering opportunities for both remote and on-site positions across various global locations.

1 - 10 employees

Founded 2018

🎯 Recruiter

🎮 Gaming

₿ Crypto

📋 Description

• We’re a fast-growing Bay Area AI company building cutting-edge products powered by large-scale machine learning models. • Looking for a Senior MLOps / AI Infrastructure Engineer to lead the development of robust, scalable infrastructure. • Critical role in designing systems that power our AI research and production environments. • Collaborate with ML researchers, software engineers, and product teams to ensure models move quickly from prototype to production — reliably and securely. • Build and scale core ML infrastructure for distributed training, hyperparameter tuning, and experiment tracking. • Design and maintain containerized model-serving infrastructure for LLMs and multimodal models with low-latency requirements. • Develop CI/CD pipelines tailored to machine learning workflows using tools like MLflow, Airflow, or Kubeflow. • Optimize compute usage and resource allocation on cloud platforms (GCP or AWS) and Kubernetes clusters. • Implement observability and alerting systems for model performance, drift, and uptime in production. • Collaborate with cross-functional teams to productionize novel research models.

🎯 Requirements

• 4+ years of software engineering experience, with at least 2 years in ML infrastructure or DevOps for AI/ML. • Proficient in Python and one systems language (Go, Rust, or similar). • Strong expertise in Kubernetes, Docker, and cloud infrastructure (GCP preferred). • Familiar with ML tooling such as PyTorch/TensorFlow, MLflow, Ray, or similar. • Experience deploying ML models in production at scale, preferably in containerized environments. • Strong understanding of distributed systems, resource orchestration, and observability. • Proficiency in English; Mandarin is a plus. • Nice to Have: Experience serving large foundation models (LLMs, vision-language, etc.). • Exposure to RAG architectures, vector search, or fine-tuning of open-source models. • Familiarity with infrastructure-as-code tools (Terraform, Helm). • Contributions to MLOps open-source tools or whitepapers.

Apply Now
Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com