Staff Software Engineer – Databases SRE

🔥 0 minutes ago

🇬🇧 United Kingdom – Remote

💵 £104k - £124.8k / year

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🇬🇧 UK Skilled Worker Visa Sponsor

info
Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Grafana Labs

Grafana Labs

501 - 1000 employees

Founded 2014

🏢 Enterprise

☁️ SaaS

🤖 Artificial Intelligence

Enterprise • SaaS • Artificial Intelligence

Grafana Labs is a company that specializes in open-source observability technologies and solutions. It offers a comprehensive suite of tools for logging, metrics, tracing, and profile management with products like Grafana, Loki, Tempo, and Mimir. Their offerings are designed to help businesses visualize, monitor, and alert on data from various sources, providing capabilities such as anomaly detection, root cause analysis, and service level objective management using AI/ML insights. Grafana Labs provides both cloud-based and self-managed solutions, ideal for infrastructure, application, and frontend observability. Additionally, their platform supports integration with various data sources like Prometheus and OpenTelemetry, making them a key player in the observability and infrastructure monitoring space.

📋 Description

• Support the highest value Grafana Cloud customers by ensuring database reliability • Partner closely with product engineering squads • Own production reliability for high-SLA customer environments • Design and implement automation for reliability practices • Ensure customers meet SLO targets • Lead incident response and reviews • Contribute to design docs and code reviews • Build automation to eliminate toil • Improve alert quality and reduce escalations

🎯 Requirements

• 8+ years engineering experience, 4+ in SRE/CRE/production engineering • Strong Kubernetes experience in AWS, GCP, or Azure • Familiarity with infrastructure-as-code tooling (Helm, Terraform, Jsonnet, etc.) • Strong technical leadership experience • Experience operating multi-tenant systems in production • Strong experience designing and implementing SLOs • Experience with one or more programming languages (e.g. Go, Python, Java) • Experience with Linux internals • Excellent problem-solving skills • Experience in incident response & post-incident reviews • Ability to reason about performance, scaling, and failure modes • Comfort with autonomous work within an engineering team

🏖️ Benefits

• Equity • Bonus (if applicable) • 30 days of annual leave • Grafana Shutdown Days • In-Person onboarding

Apply Now

Similar Jobs

🕒 June 19

Reddit, Inc.

501 - 1000

👥 B2C

📱 Media

🌍 Social Impact

Staff Site Reliability Engineer leading reliability initiatives across Ads domains at Reddit. Working to improve reliability, scalability, and operational efficiency in Reddit's advertising ecosystem.

🕒 June 11

Advanced Solutions International, Inc.

201 - 500

🤝 B2B

🤝 Non-profit

DevOps Reliability Engineer ensuring performance, scalability, and reliability of Azure-based SaaS platform at ASI. Collaborating with engineering teams to improve system efficiency and resilience.

🕒 May 26

Intermedia Cloud Communications

1001 - 5000

🤝 B2B

🏢 Enterprise

☁️ SaaS

Principal DevOps Engineer serving as technical lead and architect for infrastructure, automation, and deployments in cloud communications provider. Focused on reliability, standards, and cross-platform initiatives.

🕒 May 25

Reddit, Inc.

501 - 1000

👥 B2C

📱 Media

🌍 Social Impact

Staff Site Reliability Engineer leading reliability initiatives for critical user facing systems at Reddit. Driving operational excellence and performance for large-scale distributed systems.

🕒 May 12

Menlo Security Inc.

201 - 500

🔒 Cybersecurity

🏢 Enterprise

Principal Platform Infrastructure Engineer designing and operating Menlo Security's infrastructure platform across multiple environments. Collaborating with global teams and leveraging cloud-native technologies like Google Kubernetes Engine and Terraform.