Senior Site Reliability Engineer

Job not on LinkedIn

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of SigNoz

SigNoz

11 - 50 employees

☁️ SaaS

🏢 Enterprise

Software • SaaS • Enterprise

SigNoz is an open-source observability tool designed as an alternative to Datadog and New Relic. It provides a comprehensive platform that integrates application performance monitoring (APM), logs, metrics, traces, exceptions, and alerts in a single tool. SigNoz supports various deployment options including self-hosting and cloud services, and it's built to handle extensive data ingestion, making it suitable for teams of all sizes to monitor, troubleshoot, and enhance application performance efficiently.

📋 Description

• Own the reliability, scalability, and operability of the SigNoz cloud platform • Keep a petabyte-scale observability system fast and dependable • Scale the ingest path — making it robust to bursts while maintaining data freshness • Operate and tune ClickHouse and the data layer for performance and cost • Manage Kubernetes infrastructure: cluster operations, upgrades, multi-tenancy • Help make the observability of SigNoz itself world-class • Work with a high-caliber team across various responsibilities including SLOs/SLIs, incident response, and tooling

🎯 Requirements

• 5–8 years in SRE, infrastructure, or platform/backend roles operating production systems at scale • Deep, practical Kubernetes experience • Strong grasp of distributed systems failure modes, performance debugging, and capacity planning • Comfortable in code (Go preferred) • Loves open source — ideally with prior contributions to OSS projects • Comfortable in a high-ownership, fast-moving, remote-first environment • Strong communication — can write clear runbooks and tech docs and explain trade-offs

🏖️ Benefits

• Remote-first, async-friendly culture

Apply Now

Similar Jobs

🔥 4 hours ago

Miratech

501 - 1000

AWS DevOps Engineer responsible for designing and managing cloud infrastructure and deployment processes. Collaborating across teams to ensure scalable and reliable solutions using AWS services.

AWS

Cloud

Grafana

Jenkins

Python

Terraform

🕒 Yesterday

Akamai Technologies

5001 - 10000

🔒 Cybersecurity

As Senior II Site Reliability Engineer Lead, ensure the operation and uptime of Compute services and infrastructure. Collaborate with teams to improve system reliability and create tooling.

Ansible

Docker

Grafana

HAProxy

Jenkins

Kubernetes

Linux

NGINX

Prometheus

Redis

SaltStack

Terraform

🕒 3 days ago

American Express Global Business Travel

10,000+ employees

🤝 B2B

🚗 Transport

☁️ SaaS

Senior DevSecOps Engineer at Amex GBT responsible for building secure cloud-native infrastructure. Engage in team collaboration and mentor junior engineers in a dynamic travel industry.

Ansible

AWS

Azure

Cloud

Google Cloud Platform

Java

JavaScript

Jenkins

Python

SaltStack

Terraform

TypeScript

Go

🕒 3 days ago

Akamai Technologies

5001 - 10000

🔒 Cybersecurity

Senior Site Reliability Engineer developing solutions to enhance automation for Akamai's Compute services. Collaborating with cross-functional teams to ensure operational excellence and efficiency in applications and infrastructure.

Ansible

AWS

Azure

Cloud

Distributed Systems

Google Cloud Platform

Grafana

Prometheus

Python

SaltStack

Splunk

Terraform

Go

🕒 4 days ago

BETSOL

501 - 1000

🏢 Enterprise

☁️ SaaS

Senior Cloud Engineer at BETSOL building and operating cloud portal workloads across Azure and GCP. Focused on DevOps and DevSecOps with AI-first development practices.

Ansible

Azure

Cloud

Google Cloud Platform

Grafana

JavaScript

Jenkins

Kubernetes

Prometheus

Python

Terraform

TypeScript

Vault