Global Head of SRE

Job not on LinkedIn

November 20

Apply Now
Logo of Socure

Socure

Artificial Intelligence • Security • Finance

Socure is a leading platform for digital identity verification and trust. Utilizing advanced predictive analytics, artificial intelligence, and machine learning technologies, Socure leverages vast online and offline data intelligence including email, phone, address, IP, and device information to verify identities in real-time. Their solutions address challenges in onboarding, login authentication, account takeover prevention, and contact center operations. Socure's AI-powered platform excels in combating identity fraud, ensuring compliance, and enhancing user experiences across various industries such as financial services, eCommerce, online gaming, and crypto.

501 - 1000 employees

Founded 2012

🤖 Artificial Intelligence

🔐 Security

💸 Finance

💰 $450M Series E on 2021-11

📋 Description

• Define the global reliability strategy and roadmap across availability, latency, durability, data integrity, cost efficiency, and safety—mapped to clear business outcomes and service level objectives. • Architect multi‑region, multi‑zone resilience patterns with automated failover, graceful degradation, and progressive delivery; validate readiness through continuous game days and fault‑injection experiments. • Build and lead a world‑class red‑team QA and chaos engineering program across infrastructure, data pipelines, and applications; codify attack playbooks and steady‑state guardrails to improve detection and recovery. • Establish a unified observability practice: end‑to‑end tracing, high‑signal alerting, health and saturation indicators, user‑journey telemetry, and incident command protocols—standardized into a single, actionable operations view. • Drive rigorous incident management: real‑time incident command, rapid mitigation, blameless post‑incident reviews, durable corrective actions, and automated safeguards. • Ensure public sector readiness and continuous authorization: sustain FedRAMP Moderate posture, prove environmental parity between commercial and GovCloud, and strengthen controls for data residency, deletion, and audit evidence. • Partner with product engineering to make reliability a product feature: embed reliability patterns into RiskOS workflows and make Identity Graph‑based decisions observable, explainable, and resilient by default. • Lead developer tooling and release engineering: own CI/CD pipelines, test sandboxes and ephemeral environments, and the golden paths that make shipping changes safe, repeatable, and fast. • Advance an AI‑first SRE strategy: deploy ML for anomaly detection, incident prediction, adaptive alerting, automated runbooks, incident summarization, and capacity forecasts; measure impact via concrete reliability and efficiency wins. • Lead capacity planning and performance engineering across compute, storage, and networking—delivering consistently low‑latency decisions at peak volumes. • Attract, grow, and retain exceptional reliability engineers and leaders across regions; run a humane, effective, continuously improving on‑call program.

🎯 Requirements

• Deep experience leading reliability for large‑scale, always‑on platforms with highly sensitive data—owning availability, latency, durability, and security across multiple product lines and regions. • Mastery in modern cloud architecture (AWS), product‑aligned multi‑account patterns, real‑time observability, progressive delivery, and automated disaster recovery—with a track record of measurable reliability gains. • Experience building red‑team and chaos engineering programs that surface systemic weaknesses, improve mean time to mitigate, and harden systems over time. • Proven leadership of developer tooling at scale: CI/CD, release engineering, and ephemeral environment strategies that increase velocity while reducing risk. • Strong partnership with product, data, and security; fluency in data lifecycle, retention and deletion, privacy, and governance for regulated industries and public sector. • A people‑first leadership style: you raise the bar on hiring and mentoring, set crisp principles, and build an ownership culture grounded in curiosity, accountability, and continuous learning.

🏖️ Benefits

• Offers Equity • Offers Bonus

Apply Now

Similar Jobs

November 19

Staff Site Reliability Engineer at Stord responsible for infrastructure management and production system reliability. Focusing on GCP, automation, and mentoring within a dynamic team.

Ansible

Chef

Cloud

Distributed Systems

Docker

Google Cloud Platform

Grafana

Java

Jenkins

Kubernetes

Prometheus

Puppet

Python

Terraform

Go

November 18

Staff Cloud DevOps Engineer for Cleerly, leading cloud infrastructure and enhancing systems for AI-powered diagnostics. Focused on continuous integration, software delivery, and mentoring junior engineers.

AWS

Cloud

DynamoDB

EC2

JavaScript

Kubernetes

Linux

Node.js

Python

Terraform

November 14

NBCUniversal

10,000+ employees

📱 Media

Staff Software Engineer overseeing operational support of SAP BTP CPI applications at NBCUniversal. Leading offshore teams and collaborating on production deployments.

November 13

Staff Site Reliability Engineer at Paxos enhancing cloud infrastructure reliability and scalability. Leading initiatives in Kubernetes, IaC, and cloud services architecture.

AWS

Cloud

EC2

Kubernetes

Postgres

Python

Terraform

Go

November 13

Release Engineer for Brillio driving efficient software build and deployment processes. Collaborating with teams to ensure high-quality releases and streamline operations.

Azure

Docker

Grafana

Jenkins

Kubernetes

Python

Subversion

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com