Site Reliability Engineer

51 - 200 employees

Founded 2006

☁️ SaaS

🌐 Web 3

🛍️ eCommerce

SaaS • Web 3 • eCommerce

HostPapa is a web hosting company that provides a variety of hosting solutions including shared web hosting, WordPress hosting, VPS hosting, and reseller hosting. They also offer additional services such as domain registration, website building tools, business email services, and security features like SSL certificates. HostPapa prides itself on its 24/7 award-winning customer support available globally, ensuring that customers receive assistance in their preferred language and timezone. With a focus on small businesses, HostPapa aims to empower customers to achieve their online goals with reliable and high-performance hosting services.

Site Reliability Engineer

Job not on LinkedIn

🕒 5 days ago

🇨🇦 Canada – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Azure

Cloud

Distributed Systems

Docker

ElasticSearch

Google Cloud Platform

Grafana

Kubernetes

Linux

Python

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

HostPapa

51 - 200 employees

Founded 2006

☁️ SaaS

🌐 Web 3

🛍️ eCommerce

SaaS • Web 3 • eCommerce

📋 Description

• Define and implement SLIs, SLOs, and error budgets for critical CloudBlue services to ensure reliability and performance • Influence system architecture with a strong focus on reliability, scalability, and operability, designing systems for fault tolerance, graceful degradation, and self-healing • Reduce operational toil by identifying opportunities for automation and process improvement • Design and operate CloudBlue’s observability stack across metrics, logs, and traces using tools such as Datadog, Grafana, and Elastic Stack • Develop actionable alerting strategies and dashboards that provide clear insight into platform and business health • Design and maintain high-availability architectures, implementing redundancy, failover, and disaster recovery strategies across regions and availability zones • Conduct capacity planning, load testing, and performance optimization to ensure platform stability and scalability • Act as a senior responder during production incidents, leading incident coordination, communication, and service restoration • Own blameless postmortems and drive improvements that reduce incident frequency, MTTR, and customer impact • Improve reliability of Kubernetes-based platforms through health checks, autoscaling strategies, rollout safety, and resilience testing • Partner with engineering and DevOps teams to improve deployment safety, rollback strategies, and platform reliability • Maintain runbooks and operational documentation, and promote SRE best practices across engineering teams • Support other tasks or projects as assigned to meet team and business needs

🎯 Requirements

• 3+ years of experience as an SRE, DevOps Engineer, or Production Engineer, with strong ownership of production systems • Proven experience operating highly available, enterprise-grade, multi-tenant SaaS platforms • Hands-on experience with observability and monitoring tools such as Datadog, Grafana, and Elasticsearch/Kibana • Solid understanding of Linux, networking, and distributed systems fundamentals • Experience working with containerized environments such as Docker and Kubernetes • Strong scripting and automation skills using Python and/or Bash • Experience participating in on-call rotations and incident response in production environments • Strong written and spoken English • Experience defining SLIs/SLOs and managing error budgets at scale will be considered a plus • Cloud experience, preferably with Azure; experience with AWS and/or GCP will also be valued • Experience working with hybrid or on-premises integrations is beneficial • Familiarity with chaos engineering and resilience testing will be considered an asset

🏖️ Benefits

• A competitive salary that values you and your unique skill sets • Career advancement & professional development opportunities to help you reach your full potential • Flexible work arrangements to support work/life balance

Apply Now

Similar Jobs

Site Reliability Engineer

🕒 6 days ago

CMG (Capital Markets Gateway)

51 - 200

💳 Fintech

💸 Finance

🏢 Enterprise

Site Reliability Engineer focusing on monitoring, observability, and alerting at CMG, a fintech transforming equity capital markets.

🇨🇦 Canada – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Azure

Cloud

Docker

Grafana

Kubernetes

Linux

Postgres

Prometheus

Python

Terraform

DevOps Engineer

🕒 May 25

Neota

51 - 200

☁️ SaaS

DevOps Engineer maintaining secure, high-performing cloud infrastructure across AWS and Azure. Supporting development teams and ensuring security practices with documentation during US business hours.

🇨🇦 Canada – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Azure

Cloud

Django

EC2

Linux

Postgres

Python

RDBMS

SQL

Unix

Senior Site Reliability Engineer

🕒 May 23

ScalePad

201 - 500

☁️ SaaS

📋 Compliance

🔐 Security

Senior Site Reliability Engineer enhancing ScalePad's multi-cloud platform and developer experience. Involved in infrastructure operations across AWS and Azure while mentoring fellow engineers.

🇨🇦 Canada – Remote

💵 $130k - $150k / year

💰 Private Equity Round on 2021-07

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Azure

Cloud

Distributed Systems

Kubernetes

Terraform

DevOps Delivery Consultant – Bilingual, FR

🕒 May 23

Arctiq

201 - 500

🏢 Enterprise

☁️ SaaS

🔐 Security

Bilingual Observability Consultant at Arctiq driving advanced software-based platform solutions. Collaborate with talented individuals to deliver exceptional value in IT services and managed services.

🇨🇦 Canada – Remote

⏰ Full Time

🟢 Junior

🟡 Mid-level

⛑ DevOps & Site Reliability Engineer (SRE)

🗣️🇫🇷 French Required

Ansible

AWS

Azure

Cloud

Kubernetes

Perl

Python

Ruby

SDLC

Terraform

TypeScript

Vault

Senior/Staff Infrastructure, Site Reliability Engineer (SRE)

🕒 May 13

Oscilar

51 - 200

💳 Fintech

🏦 Banking

📋 Compliance