Search Remote Jobs

Director, Data Reliability Engineering

Job not on LinkedIn

🔥 0 minutes ago

🚗 Michigan – Remote

info

đź’µ $128.5k - $276k / year

⏰ Full Time

đź”´ Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

info
Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Rocket Mortgage

Rocket Mortgage

10,000+ employees

Founded 1985

đź’¸ Finance

đź’ł Fintech

🏠 Real Estate

Finance • Fintech • Real Estate

Rocket Mortgage is a leading online mortgage lender that simplifies the home buying and refinancing process for consumers. It offers a variety of mortgage options, including fixed-rate, adjustable-rate, FHA, and VA loans, with tools and calculators to help clients understand their financial needs. With a focus on providing a smooth user experience and various promotional offers, Rocket Mortgage is dedicated to making homeownership more accessible and affordable.

đź“‹ Description

• Lead Engineering teams responsible for improving the reliability, observability, recoverability, and operational maturity of enterprise data platforms • Define reliability standards for databases, data warehouses, pipelines, jobs, storage, access patterns, and supporting infrastructure • Establish operating expectations for monitoring, alerting, logging, incident response, change management, backup/recovery, disaster recovery, patching, access controls, service ownership, and operational readiness • Create metrics that measure platform health, data freshness, data quality, recovery readiness, incident trends, operational risk, compliance alignment, and business impact • Lead current-state assessments of systems, data flows, operational processes, observability, access patterns, and reliability gaps • Convert assessment findings into executable roadmaps that improve platform stability, data trust, security alignment, and operational predictability • Support migration and modernization programs involving on-premise platforms, AWS, Snowflake, and related enterprise data systems • Build durable operating mechanisms, including reliability reviews, service health reviews, incident reviews, operational readiness reviews, risk reviews, roadmap reviews, and executive reporting • Develop senior technical talent and create the leadership structure required to scale Data Reliability Engineering over time

🎯 Requirements

• 10+ years of experience in data infrastructure, database engineering, data platform engineering, cloud infrastructure, site reliability engineering, or related technical disciplines • 5+ years of experience leading engineering teams responsible for production systems, databases, data platforms, infrastructure platforms, or reliability engineering • Strong understanding of enterprise data infrastructure, including databases, data warehouses, pipelines, storage, compute, backup/recovery, resiliency, and production operations • Experience improving reliability practices across complex production environments, including observability, monitoring, incident response, change management, disaster recovery, and lifecycle management • Experience establishing service health metrics, data reliability metrics, operational maturity indicators, and executive-level reporting • Strong understanding of enterprise security, compliance, access management, auditability, operational controls, and infrastructure standards • Proven ability to create structure in ambiguous environments, set clear priorities, influence across teams, and translate technical reliability work into business outcomes

🏖️ Benefits

• Perks and health benefits for you and your family • Support for individual needs • Peace of mind with our offerings

Apply Now

Similar Jobs

🔥 23 hours ago

Coinbase

1001 - 5000

₿ Crypto

đź’¸ Finance

đź’ł Fintech

Staff Site Reliability Engineer driving AI transformation by ensuring reliability and automation at Coinbase. Collaborating with infrastructure teams and leading critical incident responses to maintain service excellence.

đź•’ Yesterday

Aya Healthcare

5001 - 10000

⚕️ Healthcare Insurance

🎯 Recruiter

Lead the SRE team at Aya Healthcare for enhancing product reliability and operational efficiency. Manage incident responses and AI-native operations for a top healthcare workforce solutions provider.

đź•’ Yesterday

MKS2 Technologies

201 - 500

🤝 B2B

đź”’ Cybersecurity

Site Reliability Systems Engineer working with monitoring tools to enhance VA's infrastructure reliability. Collaborating across teams to resolve outages and improve service quality for veterans.

đź•’ 3 days ago

NVIDIA

10,000+ employees

🤖 Artificial Intelligence

🎮 Gaming

Site Reliability and Software Engineering leader managing NVIDIA's DGX Cloud computing services. Overseeing team operations and driving technical project success in innovative environment.

đź•’ 3 days ago

Leidos

10,000+ employees

đź”’ Cybersecurity

🔬 Science

DevSecOps Engineer automating delivery infrastructure for mission-critical software at Leidos. Building CI/CD pipelines and maintaining security compliance in cloud environments.