Site Reliability Engineer

🕒 May 8

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Hewlett Packard Enterprise

Hewlett Packard Enterprise

10,000+ employees

Founded 2015

🏢 Enterprise

🔧 Hardware

☁️ SaaS

Enterprise • Hardware • SaaS

Hewlett Packard Enterprise is a global technology leader providing innovative IT solutions to empower businesses. HPE offers a comprehensive portfolio of products and services, including the HPE GreenLake edge-to-cloud platform, which delivers a hybrid cloud experience enabling businesses to manage workloads across private and public clouds seamlessly. Additionally, the company specializes in supercomputing, networking, and storage solutions, along with AI and data analytics capabilities to drive productivity and operational efficiency. HPE is committed to helping organizations enhance their digital transformations while securing data and optimizing IT infrastructure.

📋 Description

• Engage in and improve the whole lifecycle of services - from inception and design, through to deployment, operation, and refinement. • Support development of services from planning phase before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews. • Provide technical leadership and guidance to other team members on managing availability and performance of mission critical services, on building automation to prevent problem recurrence, and building automated responses for non-exceptional service conditions. • Maintain services once they are living by measuring and monitoring availability, latency, and overall system health. • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity. • Capacity planning the growth of cloud infrastructure. • Improve operational processes such as deployments and upgrades. • Manage execution of project priorities, deadlines, and deliverables. • Be on an on-call rotation to respond to incidents that impact platform availability. • Use your on-call shift to prevent incidents from happening. • Experience in incident response, including conducting post-mortems and implementing lessons learned, enhances system reliability.

🎯 Requirements

• 10+ years of engineering or systems experience • Experience leveraging cloud architecture, applying site reliability principles, and/or demonstrating sensitivity to operational concerns • Strong understanding of network design and architecture • Scaling and managing distributed systems • Significant experience with monitoring and observability platforms • Demonstrated ability to debug, fix, and optimize code • Troubleshooting skills across network, application, and distributed services layers • The ability to learn quickly and adapt to new technologies is essential • Excellent communications skills, both verbal and written.

🏖️ Benefits

• Health & Wellbeing • Personal & Professional Development • Unconditional Inclusion

Apply Now

Similar Jobs

🕒 May 8

TechInsights

201 - 500

Senior Site Reliability Engineer at TechInsights responsible for reliability initiatives in the AI-first platform. Collaborating on design, architecture, and mentoring while managing site reliability engineering tasks.

AWS

Cloud

Docker

Java

Kubernetes

Python

Spring

Spring Boot

SpringBoot

Terraform

🕒 April 30

Akamai Technologies

5001 - 10000

🔒 Cybersecurity

SRE Engineer designing, developing, and operating Akamai Cloud application and infrastructure. Collaborating with teams to solve complex challenges and enhance observability infrastructure.

Ansible

Chef

Distributed Systems

Puppet

SaltStack

Terraform

🕒 April 28

SOFTSWISS

1001 - 5000

🎮 Gaming

Engineering Manager leading a service-oriented infrastructure team at SOFTSWISS responsible for reliability, scalability, and efficiency. Driving team development and improving engineering processes in a high-load environment.

Cloud

Kubernetes

Terraform

🕒 April 23

RedSky

11 - 50

🔒 Cybersecurity

🏛️ Government

Venture Builder creating startups from the ground up at Red Sky. Join and build teams pushing boundaries across various industries.

🕒 April 22

CloudLinux

51 - 200

☁️ SaaS

🔐 Security

🌐 Web 3

Lead the evolution of CloudLinux's data platform into a DBaaS model. Design resilient databases and implement automated infrastructure management for high-performance systems.

Airflow

Ansible

Apache

Cloud

ETL

Kubernetes

MongoDB

Postgres

Python

Redis

SQL

Terraform

Zookeeper

Go