Senior Machine Learning Engineer – Inference Platform

51 - 200 employees

🤖 Artificial Intelligence

🛍️ eCommerce

🛒 Retail

💰 $50M Series A on 2021-10

Artificial Intelligence • eCommerce • Retail

Wizard AI is a company that provides a tailored shopping experience powered by artificial intelligence. It offers a unique text-based service that curates product recommendations across the internet, utilizing AI models that understand user preferences to predict needs and make personalized suggestions. Wizard AI simplifies shopping by finding, ordering, and tracking products, as well as managing returns through SMS communication. This service is designed to save time and improve the convenience of online shopping by efficiently juggling product data, customer reviews, and other digital content to make informed recommendations for customers.

Senior Machine Learning Engineer – Inference Platform

🕒 June 3

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

🤖 Machine Learning Engineer

🦅 H1B Visa Sponsor

AWS

Azure

Cloud

Google Cloud Platform

Python

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Wizard

51 - 200 employees

🤖 Artificial Intelligence

🛍️ eCommerce

🛒 Retail

💰 $50M Series A on 2021-10

Artificial Intelligence • eCommerce • Retail

📋 Description

• Own and evolve our multi-engine inference platform, supporting a variety of model types and serving requirements. • Build and improve production ML pipelines — taking models from experimentation to reliable, high-throughput serving. • Define and implement model versioning, rollout, rollback, and lifecycle management strategies that ensure reproducibility and operational reliability. • Define and enforce serving-layer SLAs, including latency, availability, GPU utilization, Time-to-First-Token (TTFT), and Inter-Token Latency (ITL). • Build observability, monitoring, alerting, and operational tooling for production inference systems. • Apply software engineering best practices, including testing, CI/CD integration, and reproducibility across ML workflows. • Optimize inference performance through efficient resource utilization, hardware-aware serving strategies, and cost-conscious infrastructure design. • Ensure ML serving systems are secure, scalable, and operationally resilient. • Partner with ML, Data, Product, and DevOps teams to turn ideas into production systems, driving the technical decisions on serving and scale.

🎯 Requirements

• Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related field, or equivalent practical experience. • 5–8+ years of experience in Software Engineering, ML Engineering, Platform Engineering, or Infrastructure Engineering, with direct ownership of production ML serving systems. • Hands-on experience running an LLM serving engine (vLLM, TGI, TensorRT-LLM, or SGLang) in production under real load — not just managed or hosted endpoints. • Strong Python skills and software engineering fundamentals, combined with deep systems and infrastructure knowledge. • Experience with cloud platforms such as AWS, GCP, or Azure, and familiarity with ML lifecycle tooling, experimentation platforms, and model registries. • Strong grasp of inference performance — continuous batching, KV-cache and GPU-memory behavior, quantization, and CPU-versus-GPU bottlenecks — with the instinct to profile before tuning. • Experience serving heterogeneous workloads, including LLMs, embedding models, and extraction models, each with distinct latency, throughput, and scaling requirements. • Demonstrated ability to balance latency, throughput, reliability, and infrastructure cost while operating production-scale ML systems. • Experience in high-growth startup environments and comfort operating in fast-moving, evolving technical landscapes.

🏖️ Benefits

• Health insurance • Flexible work arrangements • Professional development opportunities

Apply Now

Similar Jobs

Senior Machine Learning Engineer – Computer Vision, Liveness Detection, Spoofing

🕒 June 2

Mitek Systems

201 - 500

🤖 Artificial Intelligence

📋 Compliance

🔐 Security

Sr. Machine Learning Engineer developing models for identity verification, focusing on biometric authentication and computer vision initiatives in a flexible remote environment.

🇺🇸 United States – Remote

💵 $150k - $185k / year

⏰ Full Time

🟠 Senior

🤖 Machine Learning Engineer

🦅 H1B Visa Sponsor

Python

PyTorch

Tensorflow

Machine Learning Engineer

🕒 June 1

Multi Media, LLC

51 - 200

📱 Media

🔐 Security

📡 Telecommunications

Machine Learning Engineer at Multi Media, LLC enhancing AI and ML systems for global-scale consumer platforms. Collaborating with cross-functional teams to improve recommendations and user engagement.

🇺🇸 United States – Remote

💵 $180k - $200k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

🤖 Machine Learning Engineer

Numpy

Pandas

Python

PyTorch

Scikit-Learn

Tensorflow

Open-Source Machine Learning Engineer

🕒 May 29

Hugging Face

51 - 200

🤖 Artificial Intelligence

☁️ SaaS

🏢 Enterprise

Open-Source Machine Learning Engineer improving open-source ML ecosystem at Hugging Face. Collaborating with users and contributors on libraries like Transformers and Pytorch.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🤖 Machine Learning Engineer

🦅 H1B Visa Sponsor

Python

PyTorch

Tensorflow

Associate Director, AI and Machine Learning

🕒 May 29

BeOne Medicines

10,000+ employees

Associate Director leading AI and machine learning initiatives to improve oncology research productivity at BeOne. Partnering with global teams to implement advanced AI solutions while driving R&D digital transformation.

🇺🇸 United States – Remote

💵 $158.4k - $208.4k / year

⏰ Full Time

🟠 Senior

🤖 Machine Learning Engineer

AWS

Azure

Cloud

Python

PyTorch

Scikit-Learn

Tensorflow

Lead Machine Learning Engineer, Lifetime Value

🕒 May 29

Root Inc.

1001 - 5000

💸 Finance

👥 B2C