Staff Machine Learning Engineer, ML Platform

November 12

Apply Now
Logo of Reddit, Inc.

Reddit, Inc.

B2C • Media • Social Impact

Reddit, Inc. is a social media platform that acts as a hub for thousands of communities, where users can engage in diverse conversations ranging from breaking news to niche interests. It enables users to post, comment, and vote on content, fostering a vibrant online community. Millions of people globally connect and share their passions on Reddit, creating a dynamic environment for authentic human interaction.

501 - 1000 employees

Founded 2005

👥 B2C

📱 Media

🌍 Social Impact

📋 Description

• Design end-to-end model lifecycle patterns (MLOps) to boost velocity of development for ML engineers, including data preparation, model management, experiment tracking, and more • Zero-to-one development and support of a graph ML codebase and platform that abstracts away common patterns and enables greater model scalability and iteration • Collaborate with ML engineers on performance tuning, including improving model training time, efficiency, and GPU training costs in a large, distributed ML training environment • Optimize batch data processing within a data warehouse and with tools such as Apache Beam, Apache Spark, Ray Data, and more • Architect pipelines to build and maintain massive graph data structures on the order of billions of nodes and tens of billions of edges

🎯 Requirements

• 7+ years of experience in ML infrastructure, including model training and model deployments • Hands-on experience with ML optimization, including memory and GPU profiling • Deep experience with cloud-based technologies for supporting an ML platform, including tools like GCP BigQuery, Google Cloud Storage, infrastructure-as-code (Terraform), and more • Hands-on experience administering and integrating MLOps tools for experiment tracking, model serving, and model registries (e.g. MLflow or Wandb) • Proficiency with the common programming languages and frameworks of ML, such as Python, PyTorch, Tensorflow, etc. • Deep experience working with distributed training frameworks, including Ray and Kubernetes • Strong focus on scalability, reliability, performance, and ease of use. You are an undying advocate for platform users and have a deep intuition for the machine learning development lifecycle. • Strong organizational & communication skills • Experience working with graph databases (Neo4j, JanusGraph, TigerGraph) is a big plus • Experience working with graph neural networks (GNNs) and associated graph ML frameworks (PyTorch Geometric, Deep Graph Library) is a big plus

🏖️ Benefits

• Comprehensive Healthcare Benefits and Income Replacement Programs • 401k Match • Family Planning Support • Gender-Affirming Care • Mental Health & Coaching Benefits • Flexible Vacation & Reddit Global Days off • Generous paid Parental Leave • Paid Volunteer time off

Apply Now

Similar Jobs

November 4

Staff ML Engineer bridging breakthrough prototypes and rock-solid production AI at BrightHire, transforming early-stage GenAI features into polished, scalable capabilities.

Python

SQL

October 30

Staff Machine Learning Engineer at Trajector architecting and scaling AI/ML systems for medical evidence services. Driving business value through advanced technology solutions while mentoring engineering teams.

AWS

Cloud

Distributed Systems

Kubernetes

Python

PyTorch

Tensorflow

Terraform

October 29

Lead development and scaling of the Wildfire Fuel Detection Model at Overstory. Collaborate with teams to enhance ML capabilities using advanced satellite imagery and geospatial data.

Cloud

Google Cloud Platform

PyTorch

Remote Sensing

Spark

Tensorflow

October 29

Principal AI/ML Engineer developing advanced machine learning models and AI-driven features for Zeta Global's advertising platform. Driving innovations with LLMs to enhance marketing campaigns.

Apache

AWS

Cassandra

Cloud

Distributed Systems

Docker

DynamoDB

Hadoop

Java

Kafka

Kubernetes

MySQL

NoSQL

Postgres

Python

PyTorch

Redis

Spark

SQL

Tensorflow

Go

October 29

Machine Learning Engineer developing AI infrastructure and automation solutions for Coinbase. Participating in advanced machine learning techniques and collaborating on strategic plans.

Python

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com