CPU Storage Tech Lead

🕒 April 22

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of OpenAI

OpenAI

WebsiteLinkedIn

201 - 500 employees

Founded 2015

🤖 Artificial Intelligence

☁️ SaaS

🏢 Enterprise

Artificial Intelligence • SaaS • Enterprise

OpenAI is a leading research organization and company dedicated to creating advanced artificial intelligence technology, with a strong emphasis on safety and ethical considerations. OpenAI's mission is to ensure that artificial general intelligence (AGI) benefits all of humanity. The company develops AI products like ChatGPT, which can assist users with tasks ranging from everyday requests to complex enterprise solutions. OpenAI also provides an API platform that integrates its AI models into various applications. The company is focused on innovation in AI and improving data analysis capabilities, while emphasizing safety and ethical governance of their systems.

📋 Description

• Own CPU and storage technical strategy for Stargate compute infrastructure across current and future generations. • Evaluate CPU platforms across performance, efficiency, memory bandwidth, PCIe topology, cost, and roadmap alignment. • Define storage architectures for AI environments, including boot media, local NVMe, shared storage, caching tiers, metadata services, and high-performance data pipelines. • Drive server platform decisions involving CPU, memory, NIC, GPU, and storage subsystem integration. • Partner with performance modeling teams to quantify tradeoffs across compute, memory, I/O, and storage bottlenecks. • Work with silicon and hardware vendors on roadmap influence, feature requests, qualification plans, and technical escalations. • Lead bring-up and validation efforts for new CPU and storage platforms in lab and production environments. • Partner with networking and cluster architecture teams to optimize end-to-end node design and data movement. • Support supply chain and sourcing teams with technical vendor assessments and second-source strategies. • Drive reliability, serviceability, and fleet lifecycle planning for compute and storage platforms. • Translate future AI workload requirements into infrastructure platform specifications. • Provide technical leadership across cross-functional stakeholders and executive reviews.

🎯 Requirements

• Bachelor’s degree in Computer Engineering, Electrical Engineering, Computer Science, or related technical field; advanced degree preferred. • 10+ years of experience in server hardware, systems architecture, data center infrastructure, or hyperscale compute platforms. • Deep expertise in modern CPU architectures (x86, ARM, accelerator host systems) and server platform design. • Strong understanding of memory systems, PCIe/CXL fabrics, NUMA behavior, and platform-level performance constraints. • Experience with storage systems including NVMe, SSD qualification, RAID, distributed storage, object/file systems, or high-performance data pipelines. • Experience evaluating hardware tradeoffs across performance, cost, power, thermals, and supply availability. • Familiarity with GPU clusters and AI training/inference infrastructure strongly preferred. • Experience working directly with OEMs, ODMs, silicon vendors, or storage vendors. • Strong systems thinking with ability to connect component decisions to fleet-level outcomes. • Excellent communication skills with the ability to influence engineering and executive stakeholders. • Proven ability to operate in fast-moving, ambiguous environments with high ownership.

🏖️ Benefits

• Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit) • 401(k) retirement plan with employer match • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks) • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law) • Mental health and wellness support • Employer-paid basic life and disability coverage • Annual learning and development stipend to fuel your professional growth • Daily meals in our offices, and meal delivery credits as eligible • Relocation support for eligible employees • Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.

Apply Now

Similar Jobs

🕒 April 22

Baseten

11 - 50

🤖 Artificial Intelligence

☁️ SaaS

🏢 Enterprise

WebsiteLinkedIn

GTM Engineer at Baseten designing AI-powered workflows to enhance sales, marketing, and support. Driving CRM strategy and ensuring data quality for better performance.

🏢🏡 San Francisco – Hybrid

💵 $175k - $200k / year

💰 $8M Seed Round on 2022-04

⏰ Full Time

🟡 Mid-level

🟠 Senior

🧑‍💻 Full-stack Engineer

🦅 H1B Visa Sponsor

info

Apollo

SQL

🕒 April 21

Spring Health

501 - 1000

⚕️ Healthcare Insurance

🧘 Wellness

☁️ SaaS

WebsiteLinkedIn

Senior Software Engineer developing scalable front-end systems for mental healthcare at Spring Health. Collaborating cross-functionally on member growth initiatives from registration to provider matching.

JavaScript

Next.js

React

TypeScript

🕒 April 21

AI Fund

11 - 50

🤖 Artificial Intelligence

🤝 B2B

WebsiteLinkedIn

Senior Software Engineer driving AI innovation for Fortune 500 energy leader and AI Fund. Building systems to optimize the operation and management of critical assets in energy supply.

Cloud

Django

Google Cloud Platform

Python

React

TypeScript

🕒 April 20

Ironclad

201 - 500

☁️ SaaS

🏢 Enterprise

WebsiteLinkedIn

AI Engineer designing autonomous marketing workflows to optimize revenue growth at Ironclad. Collaborating with teams to integrate AI solutions and enhance marketing operations.

🕒 April 17

Salesforce

10,000+ employees

☁️ SaaS

🤝 B2B

🤖 Artificial Intelligence

WebsiteLinkedIn

Distributed Systems Software Engineer at Salesforce developing resilient distributed systems and cloud infrastructure tools. Collaborating with multiple teams for public cloud platforms like AWS and GCP.

AWS

Azure

Cloud

Distributed Systems

Docker

Google Cloud Platform

Java

Kubernetes

Microservices

Open Source

Postgres

Python

Ruby

Spinnaker

Terraform

Zookeeper

Go