Site Reliability Engineer II

Job not on LinkedIn

November 24

Apply Now
Logo of Backblaze

Backblaze

Cloud Storage • eCommerce • Enterprise

Backblaze is a cloud storage company that provides scalable and secure data backup solutions for both businesses and individuals. Their B2 Cloud Storage service offers S3 compatible object storage, allowing users to easily protect and manage their data with transparent pricing. Backblaze specializes in automatic and unlimited backup services for computer systems, ensuring data protection and recovery options for users, while also supporting integration with applications for enhanced functionality.

201 - 500 employees

Founded 2007

🛍️ eCommerce

🏢 Enterprise

💰 $5M Series A on 2012-07

📋 Description

• Support the availability and durability of critical services across production environments. • Monitor service health using SLIs, SLOs, and error budgets, and escalate issues when thresholds are at risk. • Participate in on-call rotations, incident response, and post-incident reviews to drive service improvements. • Follow established ITIL/OSS processes (incident, change, problem, and capacity management). • Develop automation for common operational tasks, reducing manual intervention and toil. • Contribute to monitoring, logging, and alerting frameworks (e.g., Prometheus, Grafana, Catchpoint, ELK). • Work with CI/CD pipelines, configuration management, and infrastructure as code tools (Terraform, Ansible, Jenkins). • Write scripts (Bash, Python, Go, etc.) to improve system reliability and efficiency. • Partner with engineering, product, and operations teams to support resilient system design and operations. • Assist in capacity planning and disaster recovery exercises. • Work with vendors and service providers to troubleshoot service issues and track SLA performance. • Document systems, share learnings, and help grow a reliability-minded engineering culture. • Contribute to playbooks, runbooks, and operational documentation. • Identify recurring issues and propose long-term improvements. • Promote reliability-focused practices within development and operations teams.

🎯 Requirements

• Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience). • 2–4 years of experience in site reliability, systems engineering, or operations. • Exposure to large-scale, production-grade systems. • Solid Linux systems administration and troubleshooting skills. • Familiarity with service reliability concepts - monitoring, alerting, incident response, and root cause analysis. • Proficiency in at least one scripting language (Python, Bash, or Go). • Understanding of containers (Kubernetes, Docker) and microservices concepts. • Knowledge of incident response and operational best practices. • Experience in a SaaS, service provider, or distributed systems environment.

🏖️ Benefits

• Diversity, equity, and inclusion initiatives • Professional development opportunities

Apply Now

Similar Jobs

November 18

Site Reliability Engineer developing scalable systems and automating processes for cloud services company. Collaborating with teams to enhance technology performance and user experience.

AWS

EC2

HAProxy

Java

Linux

NGINX

Python

Redis

Splunk

Terraform

Go

November 15

AWS DevOps Engineer managing AWS infrastructure and driving migration initiatives at Avahi. Collaborating with cross-functional teams to implement DevOps best practices and mentoring junior members.

AWS

Cloud

Docker

EC2

Kubernetes

Terraform

November 14

DevOps Engineer responsible for designing and maintaining scalable infrastructure for healthcare. Utilizing AWS services and CI/CD practices to bridge software engineering and infrastructure management.

Angular

AWS

DynamoDB

EC2

GraphQL

Java

JavaScript

Jenkins

Microservices

Node.js

Python

React

Terraform

.NET

November 14

Azure DevOps Engineer responsible for designing and managing Azure environments. Working with customers on cloud transformation projects leveraging IaC, CI/CD, and DevOps best practices.

Ansible

AWS

Azure

Chef

Cloud

Firewalls

Google Cloud Platform

Oracle

Puppet

Terraform

November 12

DevOps Engineer building high-performing platform for YipitData, analyzing alternative data points for actionable insights. Collaborate with teams to maintain system stability and optimize CI/CD pipelines.

AWS

Cloud

Grafana

Jenkins

Kubernetes

Linux

Prometheus

Python

Go

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com