Principal Engineer, Operational Excellence – Resilience

October 16

Apply Now
Logo of CrowdStrike

CrowdStrike

Cybersecurity • SaaS • Artificial Intelligence

CrowdStrike is a cybersecurity company that provides cloud-based security services to stop breaches. It is recognized as a leader in endpoint protection, identity and cloud security, and managed detection and response. CrowdStrike's platform, Falcon, integrates artificial intelligence to offer real-time visibility, detection, and protection against sophisticated cyber threats. The company is lauded for its effectiveness in securing networks and data, making it a trusted partner for businesses worldwide.

📋 Description

• Facilitate coordination between stakeholders across IT, Product, Engineering, and business units, serving as the central point for technology resilience initiatives and ensuring alignment with business objectives • Own and maintain enterprise-wide technology resilience standards, ensuring consistent implementation and reducing organizational drift from established frameworks across infrastructure, application, and product domains • Drive comprehensive technical resilience architecture including infrastructure redundancy and fault tolerance, application resilience and graceful degradation strategies, and chaos engineering frameworks for continuous resilience validation • Lead enterprise technical recovery strategy development and implementation, including backup and redundancy systems, recovery time/point objectives (RTO/RPO) for technical systems, and data recovery/restoration procedures • Partner to define and implement resilience standards, including feature flagging, release, testing, multi-tenancy frameworks, and scalability frameworks to manage growth • Provide technical oversight and aggregation of technology resilience risks across the enterprise, establishing and monitoring key performance indicators including system uptime • Drive chaos engineering and resilience testing programs, establishing enterprise-wide practices for proactive resilience validation and continuous improvement • Own shared resilience tooling strategy, evaluation, and implementation to support enterprise-wide capabilities including monitoring, testing, and recovery automation • Build and maintain formal networks with key constituents across business units, engineering teams, and external partners • Serve as senior technical advisor during major incident response, providing expertise on technical recovery strategies and coordinating cross-functional recovery efforts • Drive innovation in resilience practices, identifying emerging technologies and methodologies to advance CrowdStrike's competitive resilience advantage • Provide strategic guidance and expertise to junior team members and cross-functional partners on resilience engineering best practices

🎯 Requirements

• 10+ years of direct experience in technology resilience, disaster recovery, site reliability engineering, or related technical disciplines, with demonstrated expertise in enterprise-scale cloud-native environments • Deep understanding of infrastructure redundancy patterns, application resilience design, chaos engineering principles, and enterprise disaster recovery strategies across hybrid cloud architectures • Proven experience with feature management systems, progressive deployment strategies, multi-tenant architecture resilience, and scalability engineering practices • Proven ability to drive strategic initiatives across large technology organizations, with experience influencing senior stakeholders and leading complex, cross-functional resilience programs • Experience establishing and monitoring resilience KPIs, including system uptime, MTTR, RTO/RPO objectives, and deployment success metrics • Advanced certifications in disaster recovery, cloud architecture, or site reliability disciplines (e.g., DRCS, CISSP, AWS/Azure/GCP architecture certifications) • Exceptional written and oral communication skills, including experience developing and delivering strategic briefings to executive leadership and technical teams • Advanced analytical and conceptual thinking abilities, with proven track record of solving complex, ambiguous resilience challenges with enterprise-wide impact • Demonstrated ability to build formal networks and influence stakeholders across engineering, product, and business organizations • Bachelor's degree in Computer Science, Information Systems, Engineering, Risk/Resilience, or equivalent practical experience

🏖️ Benefits

• Remote-friendly and flexible work culture • Market leader in compensation and equity awards • Comprehensive physical and mental wellness programs • Competitive vacation and holidays for recharge • Paid parental and adoption leaves • Professional development opportunities for all employees regardless of level or role • Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections • Vibrant office culture with world class amenities • Great Place to Work Certified™ across the globe

Apply Now

Similar Jobs

October 16

Principal Software Engineer at O'Reilly Auto Parts focused on digital commerce and complex application systems. Leading development and guiding technical teams to ensure high-quality software solutions.

SDLC

October 15

Software Engineer focusing on AI and Automation to define customer service future at Gladly. Building production-grade AI systems that positively impact agents and customers.

Postgres

React

TypeScript

Go

October 15

Staff Software Engineer developing scalable software solutions for Omada Health's digital care platform. Leading projects and collaborating with stakeholders to drive innovations in health care technology.

AWS

Cloud

Docker

Kubernetes

MySQL

Postgres

Python

Ruby

SDLC

October 15

Director of Software Engineering at Anywhere Real Estate, shaping digital channels for real estate industry. Leading teams in innovative cloud-native technology solutions.

Angular

AWS

Azure

Cloud

Dart

Flutter

Google Cloud Platform

JavaScript

Microservices

MongoDB

Node.js

React

October 15

Principal Software Engineer at O'Reilly Auto Parts developing and maintaining complex applications. Leading software development and project coordination with a focus on performance and quality.

SDLC

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com