
501 - 1000 employees
Founded 2014
🏢 Enterprise
☁️ SaaS
🤖 Artificial Intelligence
Enterprise • SaaS • Artificial Intelligence
Grafana Labs is a company that specializes in open-source observability technologies and solutions. It offers a comprehensive suite of tools for logging, metrics, tracing, and profile management with products like Grafana, Loki, Tempo, and Mimir. Their offerings are designed to help businesses visualize, monitor, and alert on data from various sources, providing capabilities such as anomaly detection, root cause analysis, and service level objective management using AI/ML insights. Grafana Labs provides both cloud-based and self-managed solutions, ideal for infrastructure, application, and frontend observability. Additionally, their platform supports integration with various data sources like Prometheus and OpenTelemetry, making them a key player in the observability and infrastructure monitoring space.
🔥 0 minutes ago
🇬🇧 United Kingdom – Remote
💵 £104k - £124.8k / year
⏰ Full Time
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🇬🇧 UK Skilled Worker Visa Sponsor
Improve your chances of getting an interview by checking your resume score before you apply.

501 - 1000 employees
Founded 2014
🏢 Enterprise
☁️ SaaS
🤖 Artificial Intelligence
Enterprise • SaaS • Artificial Intelligence
Grafana Labs is a company that specializes in open-source observability technologies and solutions. It offers a comprehensive suite of tools for logging, metrics, tracing, and profile management with products like Grafana, Loki, Tempo, and Mimir. Their offerings are designed to help businesses visualize, monitor, and alert on data from various sources, providing capabilities such as anomaly detection, root cause analysis, and service level objective management using AI/ML insights. Grafana Labs provides both cloud-based and self-managed solutions, ideal for infrastructure, application, and frontend observability. Additionally, their platform supports integration with various data sources like Prometheus and OpenTelemetry, making them a key player in the observability and infrastructure monitoring space.
• Support the highest value Grafana Cloud customers by ensuring database reliability • Partner closely with product engineering squads • Own production reliability for high-SLA customer environments • Design and implement automation for reliability practices • Ensure customers meet SLO targets • Lead incident response and reviews • Contribute to design docs and code reviews • Build automation to eliminate toil • Improve alert quality and reduce escalations
• 8+ years engineering experience, 4+ in SRE/CRE/production engineering • Strong Kubernetes experience in AWS, GCP, or Azure • Familiarity with infrastructure-as-code tooling (Helm, Terraform, Jsonnet, etc.) • Strong technical leadership experience • Experience operating multi-tenant systems in production • Strong experience designing and implementing SLOs • Experience with one or more programming languages (e.g. Go, Python, Java) • Experience with Linux internals • Excellent problem-solving skills • Experience in incident response & post-incident reviews • Ability to reason about performance, scaling, and failure modes • Comfort with autonomous work within an engineering team
• Equity • Bonus (if applicable) • 30 days of annual leave • Grafana Shutdown Days • In-Person onboarding
Apply Now🕒 June 19
Staff Site Reliability Engineer leading reliability initiatives across Ads domains at Reddit. Working to improve reliability, scalability, and operational efficiency in Reddit's advertising ecosystem.
🕒 June 11
DevOps Reliability Engineer ensuring performance, scalability, and reliability of Azure-based SaaS platform at ASI. Collaborating with engineering teams to improve system efficiency and resilience.
🇬🇧 United Kingdom – Remote
💰 Venture Round on 2022-01
⏰ Full Time
🟠 Senior
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🕒 May 26
Principal DevOps Engineer serving as technical lead and architect for infrastructure, automation, and deployments in cloud communications provider. Focused on reliability, standards, and cross-platform initiatives.
🇬🇧 United Kingdom – Remote
💰 Venture Round on 2017-02
⏰ Full Time
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🕒 May 25
Staff Site Reliability Engineer leading reliability initiatives for critical user facing systems at Reddit. Driving operational excellence and performance for large-scale distributed systems.
🕒 May 12
Principal Platform Infrastructure Engineer designing and operating Menlo Security's infrastructure platform across multiple environments. Collaborating with global teams and leveraging cloud-native technologies like Google Kubernetes Engine and Terraform.
🇬🇧 United Kingdom – Remote
💰 $100M Series E on 2020-11
⏰ Full Time
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)