Senior Site Reliability Engineer

🕒 May 23

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of ScalePad

ScalePad

201 - 500 employees

☁️ SaaS

📋 Compliance

🔐 Security

💰 Private Equity Round on 2021-07

SaaS • Compliance • Security

ScalePad is a company that provides a comprehensive platform for Managed Service Providers (MSPs) to enhance client engagement and operational efficiency. With products like Lifecycle Manager, Backup Radar, and Cognition360, ScalePad offers solutions that streamline compliance, backup monitoring, and client communications. Their platform integrates seamlessly with third-party tools to offer a cohesive ecosystem that empowers MSPs to deliver superior client experiences and strategic insights. ScalePad is committed to innovation and excellence, helping MSPs transform and scale their offerings with automation and data-driven insights.

📋 Description

• Operate production infrastructure across AWS and Azure, including networking, IAM, and cost. • Build and operate Terraform modules and state at scale, keeping our infrastructure as code clean and reviewable. • Run Kubernetes in production: upgrades, scaling, troubleshooting, and platform improvements. • Operate and improve CI/CD pipelines that the entire engineering org depends on. • Operationalize SLO/SLI frameworks and observability practices alongside the SRE team. • Drive incident response practice, on-call tooling, and incident review follow-through. • Reduce operational toil through automation across secret rotation, access management, and environment provisioning. • Contribute to capacity planning, disaster recovery, and resilience work across critical systems. • Build and maintain internal developer tooling that removes friction across engineering. • Lead rollouts of AI-native tooling for code review, testing, and engineering productivity. • Own migrations and consolidation of internal platforms such as Jira, Confluence, ticketing, and documentation systems. • Mentor engineers and technical leads, fostering growth and knowledge-sharing within the organization. • Evaluate and introduce new technologies, tools, and approaches to improve scalability and efficiency.

🎯 Requirements

• 5+ years of experience in software engineering, infrastructure, or related technical disciplines, with a focus on Site Reliability Engineering (SRE), DevOps, Platform Engineering, or similar roles. • Strong expertise in cloud infrastructure, distributed systems, networking, and observability practices. • Experience designing and operating highly available, scalable production systems. • Deep understanding of scripting, automation, infrastructure as code, CI/CD, and operational best practices. • Experience implementing SLO/SLI frameworks and reliability engineering methodologies. • Incident management, troubleshooting, and on-call experience in complex production environments. • Passion for mentoring engineers and improving engineering culture.

🏖️ Benefits

• Share in our success through our Employee Stock Ownership Plan (ESOP) and RRSP matching. • Parental leave programs are in place to support you and your family when it matters most. • Join opt-in mentorship programs and learn directly from founders and senior leaders. • Access an annual professional development budget to level up your skills, your career, and your impact. • Work with brand new, top-of-the-line hardware and equipment. • Receive a monthly stipend to help you create an effective hybrid or remote work environment. • Take care of yourself with 100% employer-paid benefits.

Apply Now

Similar Jobs

🕒 May 13

Oscilar

51 - 200

💳 Fintech

🏦 Banking

📋 Compliance

Senior SRE managing resilient cloud infrastructure for Oscilar's AI Risk Decisioning™ Platform. Leading best practices and mentoring engineers in a remote-first culture.

AWS

Cloud

Distributed Systems

Kafka

Kubernetes

Microservices

Python

Terraform

Go

🕒 May 8

HostPapa

51 - 200

☁️ SaaS

🌐 Web 3

🛍️ eCommerce

DevOps Engineer at HostPapa designing and operating cloud infrastructure for multi-tenant SaaS platforms. Focused on CI/CD, infrastructure automation, and scalability.

Ansible

AWS

Azure

Cloud

Distributed Systems

Docker

Google Cloud Platform

Grafana

Groovy

Jenkins

Kubernetes

Linux

Microservices

Python

Terraform

🕒 May 7

Fullsteam

1001 - 5000

💳 Fintech

☁️ SaaS

🤝 B2B

Lead DevOps Manager at Fullsteam overseeing infrastructure and operational practices while managing a team. Responsible for scaling reliability and supporting product and engineering collaboration.

AWS

Cloud

Docker

EC2

Kubernetes

Microservices

Python

Terraform

🕒 May 6

Intrahealth, a HEALWELL AI Company

51 - 200

⚕️ Healthcare Insurance

☁️ SaaS

🤖 Artificial Intelligence

DevOps Engineer fluent in AI-augmented development to build Kubernetes infrastructure for Intrahealth. Responsible for CI/CD pipelines and ensuring reliable cloud environments.

AWS

Azure

Cloud

DNS

Google Cloud Platform

Kubernetes

Python

Terraform

Go

🕒 May 1

Ticketmaster

10,000+ employees

🛍️ eCommerce

⚽ Sports

Lead Site Reliability Developer delivering consulting across teams for Ticketmaster's SRE practices. Focused on enhancing reliability, resilience, and engineering practices globally from Toronto or Quebec.

🗣️🇫🇷 French Required

AWS

Kubernetes