Engineering Manager, Site Reliability (SRE)

November 24

Apply Now
Logo of SentinelOne

SentinelOne

Cybersecurity • Artificial Intelligence • SaaS

SentinelOne is a leader in autonomous cybersecurity, known for its innovative use of AI across endpoint, cloud, and identity protection solutions. It is recognized by Gartner as a leader in the Magic Quadrant for Endpoint Protection Platforms for four consecutive years. SentinelOne's Singularity platform integrates enterprise security, offering features like AI-powered threat detection, endpoint and cloud security, vulnerability management, and threat intelligence. The company supports various industries by delivering real-time protection and operational efficiency while leveraging AI for advanced threat hunting and log analytics. With a strong focus on reducing risk and enhancing security performance, SentinelOne caters to enterprises worldwide with secure, scalable solutions.

📋 Description

• Grow and lead a team of SRE professionals, including setting performance goals and measuring deliverables against key metrics, while evolving those metrics as S1 grows and needs develop • Invest in data-driven deep triage on recurring issues, collaborating with other engineering teams to identify and address issues related to reliability, performance, and capacity • Develop, improve, and implement processes for the full incident lifecycle, including incident management, post-incident analysis, and learning from incidents. Lead incident response efforts, including coordinating with other teams to investigate and resolve customer-impacting incidents • Design support model for SRE regarding service maturity and service ownership, including monitoring and alerting improvements, and SLI / SLO design and implementation • Analyze production metrics and signals to identify areas for improvement and take proactive steps to mitigate issues • Develop and implement best practices and standards for Site Reliability Engineering, from day-to-day operations to hiring and planning • Communicate effectively with cross-functional teams to ensure alignment on objectives and priorities. Deliver outcomes, not just stories and tasks.

🎯 Requirements

• 8+ years of related engineering experience, with at least 2 years in a management role • Demonstrated experience leading technical and operational teams at various stages of maturity • Excellent analytical and problem-solving skills • Familiarity with modern software development methodologies, tools, and techniques, including CI/CD • Experience working with cloud-native applications and large-scale distributed systems, including a working knowledge of technologies such as Kubernetes and Terraform/IaC, and cloud providers such as AWS or GCP • Experience with various monitoring and alerting techniques and tools, including frameworks and concepts such as SLOs, OTel and Golden Signals as well as tooling such as Prometheus and Grafana • Extensive experience with incident response and management at various layers of the stack across different business needs and applications, including both hands-on experience leading incidents/post-incident analysis and experience driving broader incident management initiatives • Ability to thrive in a fast-paced, dynamic environment

🏖️ Benefits

• Medical, Vision, Dental, 401(k), Commuter, Health and Dependent FSA • Unlimited PTO • Industry-leading gender-neutral parental leave • Paid Company Holidays • Paid Sick Time • Employee stock purchase program • Disability and life insurance • Employee assistance program • Gym membership reimbursement • Cell phone reimbursement • Numerous company-sponsored events, including regular happy hours and team-building events

Apply Now

Similar Jobs

November 22

DevOps Engineer managing CI/CD pipelines for SambaNova's AI inference platforms. Collaborating with engineering teams to ensure robust release infrastructure and deployment efficiency.

AWS

Docker

Jenkins

Kubernetes

Linux

Python

Unix

November 22

Senior DevOps Engineer at MOCA Systems responsible for infrastructure management and automation. Utilizing Ansible, Terraform, Python, and Jenkins for software deployment and monitoring.

Ansible

Jenkins

Python

Terraform

November 22

Senior Cloud DevOps Engineer leading AWS migration and modernization efforts at KBR. Collaborating with cross-functional teams to enhance cloud scalability and cost-effectiveness.

Ansible

AWS

Chef

Cloud

Jenkins

Puppet

Python

Terraform

November 21

Typeface

11 - 50

Forward Deployment Engineer at Typeface translating business needs into scalable technical architectures and building AI-driven applications. Collaborating with customer success and product teams on innovative solutions.

Angular

AWS

Azure

BigQuery

Cloud

Docker

EC2

GraphQL

Java

JavaScript

Kubernetes

Microservices

Next.js

Node.js

Python

React

Spring

Spring Boot

SpringBoot

Terraform

November 21

Engineer focused on maintaining enterprise-level cloud infrastructure for microservices in fintech. Responsibilities include DevOps practices and improving operational reliability.

Azure

Cloud

Docker

ElasticSearch

Grafana

Kubernetes

Linux

Postgres

Python

RabbitMQ

Redis

Terraform

Unix

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com