Engineering Manager, Site Reliability (SRE)

November 24

Apply Now
Logo of SentinelOne

SentinelOne

Cybersecurity • Artificial Intelligence • SaaS

SentinelOne is a leader in autonomous cybersecurity, known for its innovative use of AI across endpoint, cloud, and identity protection solutions. It is recognized by Gartner as a leader in the Magic Quadrant for Endpoint Protection Platforms for four consecutive years. SentinelOne's Singularity platform integrates enterprise security, offering features like AI-powered threat detection, endpoint and cloud security, vulnerability management, and threat intelligence. The company supports various industries by delivering real-time protection and operational efficiency while leveraging AI for advanced threat hunting and log analytics. With a strong focus on reducing risk and enhancing security performance, SentinelOne caters to enterprises worldwide with secure, scalable solutions.

1001 - 5000 employees

Founded 2013

🔒 Cybersecurity

🤖 Artificial Intelligence

☁️ SaaS

📋 Description

• Grow and lead a team of SRE professionals, including setting performance goals and measuring deliverables against key metrics, while evolving those metrics as S1 grows and needs develop • Invest in data-driven deep triage on recurring issues, collaborating with other engineering teams to identify and address issues related to reliability, performance, and capacity • Develop, improve, and implement processes for the full incident lifecycle, including incident management, post-incident analysis, and learning from incidents. Lead incident response efforts, including coordinating with other teams to investigate and resolve customer-impacting incidents • Design support model for SRE regarding service maturity and service ownership, including monitoring and alerting improvements, and SLI / SLO design and implementation • Analyze production metrics and signals to identify areas for improvement and take proactive steps to mitigate issues • Develop and implement best practices and standards for Site Reliability Engineering, from day-to-day operations to hiring and planning • Communicate effectively with cross-functional teams to ensure alignment on objectives and priorities. Deliver outcomes, not just stories and tasks.

🎯 Requirements

• 8+ years of related engineering experience, with at least 2 years in a management role • Demonstrated experience leading technical and operational teams at various stages of maturity • Excellent analytical and problem-solving skills • Familiarity with modern software development methodologies, tools, and techniques, including CI/CD • Experience working with cloud-native applications and large-scale distributed systems, including a working knowledge of technologies such as Kubernetes and Terraform/IaC, and cloud providers such as AWS or GCP • Experience with various monitoring and alerting techniques and tools, including frameworks and concepts such as SLOs, OTel and Golden Signals as well as tooling such as Prometheus and Grafana • Extensive experience with incident response and management at various layers of the stack across different business needs and applications, including both hands-on experience leading incidents/post-incident analysis and experience driving broader incident management initiatives • Ability to thrive in a fast-paced, dynamic environment

🏖️ Benefits

• Medical, Vision, Dental, 401(k), Commuter, Health and Dependent FSA • Unlimited PTO • Industry-leading gender-neutral parental leave • Paid Company Holidays • Paid Sick Time • Employee stock purchase program • Disability and life insurance • Employee assistance program • Gym membership reimbursement • Cell phone reimbursement • Numerous company-sponsored events, including regular happy hours and team-building events

Apply Now

Similar Jobs

November 22

SambaNova Systems

201 - 500

🤖 Artificial Intelligence

🔧 Hardware

🏢 Enterprise

DevOps Engineer managing CI/CD pipelines for SambaNova's AI inference platforms. Collaborating with engineering teams to ensure robust release infrastructure and deployment efficiency.

November 22

MOCA Systems, Inc.

51 - 200

🏢 Enterprise

Senior DevOps Engineer at MOCA Systems responsible for infrastructure management and automation. Utilizing Ansible, Terraform, Python, and Jenkins for software deployment and monitoring.

🇺🇸 United States – Remote

💵 $105k - $125k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

November 22

KBR, Inc.

10,000+ employees

🏛️ Government

Senior Cloud DevOps Engineer leading AWS migration and modernization efforts at KBR. Collaborating with cross-functional teams to enhance cloud scalability and cost-effectiveness.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

November 21

Typeface

11 - 50

Forward Deployment Engineer at Typeface translating business needs into scalable technical architectures and building AI-driven applications. Collaborating with customer success and product teams on innovative solutions.

🇺🇸 United States – Remote

💰 $100M Series B on 2023-06

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

November 21

CMG (Capital Markets Gateway)

51 - 200

💳 Fintech

💸 Finance

🏢 Enterprise

Engineer focused on maintaining enterprise-level cloud infrastructure for microservices in fintech. Responsibilities include DevOps practices and improving operational reliability.

🇺🇸 United States – Remote

💵 $130k - $150k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com