Senior Production Engineer, Storage

Job not on LinkedIn

November 12

Apply Now
Logo of CoreWeave

CoreWeave

Artificial Intelligence • Cloud Computing • SaaS

CoreWeave is a cloud service provider that specializes in purpose-built infrastructure designed for AI workloads. Known as the AI Hyperscaler™, CoreWeave offers a range of products including GPU and CPU compute services, storage solutions, and networking services optimized for deep learning, AI model training, and rendering applications. With a robust cloud platform, CoreWeave simplifies complex infrastructure management, ensuring reliability, scalability, and high-performance computing suitable for leading AI labs and enterprises.

11 - 50 employees

Founded 2017

🤖 Artificial Intelligence

☁️ SaaS

💰 $100M Debt Financing on 2022-12

📋 Description

• Design and implement integrations between storage vendor solutions and CoreWeave IaaS offerings to support CoreWeave’s growing AI and cloud infrastructure needs. • Work with leading edge, AI-focused technologies such as RDMA, GPU Direct Storage, SPDK, and distributed filesystems to optimize storage performance and efficiency. • Lead efforts to improve the reliability, durability, and observability of our customers’ storage solutions. • Collaborate with operations teams to monitor, troubleshoot, and improve storage systems in production environments. • Help develop metrics and dashboards to provide visibility into storage performance and health. • Analyze telemetry and system data to drive improvements in throughput, latency, and resilience. • Work cross-functionally with platform, product, and infrastructure teams to deliver seamless storage capabilities across the stack. • Share your knowledge and mentor other engineers on best practices in operating and integrating distributed, high-performance systems.

🎯 Requirements

• Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field. • 6–10 years of experience working in storage systems engineering or infrastructure. • Positive, solutions-oriented collaborator who can strengthen customer and vendor relationships • Strong hands-on experience working with storage vendor solutions in production. • Experience orchestrating one or more storage protocols (e.g. S3, NFS) in production environments. • Proficiency in the Go programming language • Familiarity with observability tools and telemetry pipelines (e.g., Prometheus, Grafana, Loki). • Solid understanding of cloud-native infrastructure, Kubernetes, and scalable system architecture. • Strong debugging and problem-solving skills in distributed, high-performance environments. • Clear communicator, able to work collaboratively across teams and share technical insights effectively.

🏖️ Benefits

• Medical, dental, and vision insurance - 100% paid for by CoreWeave • Company-paid Life Insurance • Voluntary supplemental life insurance • Short and long-term disability insurance • Flexible Spending Account • Health Savings Account • Tuition Reimbursement • Ability to Participate in Employee Stock Purchase Program (ESPP) • Mental Wellness Benefits through Spring Health • Family-Forming support provided by Carrot • Paid Parental Leave • Flexible, full-service childcare support with Kinside • 401(k) with a generous employer match • Flexible PTO • Catered lunch each day in our office and data center locations • A casual work environment • A work culture focused on innovative disruption

Apply Now

Similar Jobs

October 21

Senior Production Engineer at Naehas, a fast-growing SaaS company in Silicon Valley. Engaging in infrastructure and reliability engineering to support production systems in AWS cloud environments.

AWS

Cloud

Distributed Systems

Docker

DynamoDB

Kubernetes

Linux

MongoDB

Python

Terraform

Go

October 21

Senior Staff Production Engineer at Lightspark driving technical vision and infrastructure architecture for open payment solutions powered by Bitcoin. Mentoring engineers and leading high-impact initiatives.

AWS

Cloud

Distributed Systems

Kubernetes

Python

Rust

Terraform

Go

August 14

Liftoff Mobile

501 - 1000

Senior Software Engineer, Production Engineering at Liftoff builds scalable supply infrastructure; improves tooling and reliability.

AWS

Azure

Cloud

Distributed Systems

Google Cloud Platform

HAProxy

Kafka

Kubernetes

Microservices

NoSQL

Postgres

RabbitMQ

Redis

Spark

SQL

Go

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com