Principal Site Reliability Engineer

October 3

Apply Now
Logo of Expel

Expel

Cybersecurity • SaaS • Technology

Expel is a leading cybersecurity company specializing in Managed Detection and Response (MDR) services. They offer a range of solutions, including phishing investigation, threat hunting, and vulnerability prioritization, tailored for organizations of all sizes with 24x7 protection. Expel's Security Operations Platform, Expel Workbench™, integrates with existing tech to enhance security operations. Their expert team and advanced technology help reduce alert noise, respond swiftly to incidents, and improve overall security posture, enabling organizations to focus on core business activities without worrying about cybersecurity threats.

201 - 500 employees

Founded 2016

🔒 Cybersecurity

☁️ SaaS

📋 Description

• Lead project work to build and maintain platform features that cut across the Expel product’s reliability, networking, and cloud infrastructure • Contribute by pushing IaC commits daily, with occasional opportunities to write and test application code in Python, Golang, and Javascript • Mentor and motivate service owners on how to use the platform in order to deploy, measure, monitor, and operate their own services at scale • Participate in a weekly support rotation that includes taking the on-call pager and providing nearly on-demand working-hours support to platform users • Lead incident response, triage, and root cause analysis support

🎯 Requirements

• Significant experience operating Kubernetes within highly distributed environments • Experience running systems in GCP or AWS • Exposure to monitoring and observability infrastructure and standard methodologies • An understanding of infrastructure-as-code practices, tools, and patterns • Some experience developing software in Linux environments, preferably with Python and/or Golang • A customer-minded approach that enables the success of platform users as well as building trust across the organization • A collaborative disposition that allows you to work optimally on and across teams • Six years of systems experience either in operations or development

🏖️ Benefits

• Unlimited PTO (which we model and encourage) • Work location flexibility • Up to 24 weeks of parental leave • Excellent health benefits

Apply Now

Similar Jobs

October 1

Principal Site Reliability Engineer at Blue River Technology creating hybrid infrastructure for edge devices and cloud resources. Focused on optimizing performance, cost, and collaboration across teams.

AWS

Cloud

EC2

Jenkins

Kubernetes

Linux

Python

Terraform

Go

September 25

NBCUniversal

10,000+ employees

📱 Media

Lead SAP BTP platform reliability and integrations for NBCUniversal. Manage incidents, offshore teams, deployments, architecture, and governance for finance transformation.

Cloud

SOAP

Go

September 24

Lead hybrid infrastructure, automation, and observability for NMI's payments platform. Ensure high availability, on-call reliability, and colocation operations.

Ansible

Cloud

Grafana

Kubernetes

Linux

MySQL

Prometheus

Puppet

Python

VMware

Go

September 24

Lead migration and build scalable AWS infrastructure; own CI/CD and DevOps tooling at Veeva, a life sciences cloud company

Ansible

AWS

Cloud

EC2

ElasticSearch

Grafana

Groovy

Jenkins

Kubernetes

Prometheus

Terraform

September 24

Lead design and migration of scalable AWS infrastructure and CI/CD for Veeva, a life sciences industry cloud company.

Ansible

AWS

Cloud

EC2

ElasticSearch

Grafana

Groovy

Jenkins

Kubernetes

Prometheus

Terraform

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com