Post a Job Affiliates

Search Remote Jobs

Grafana Labs

Website LinkedIn All Job Openings

501 - 1000 employees

Founded 2014

🏢 Enterprise

☁️ SaaS

🤖 Artificial Intelligence

Enterprise • SaaS • Artificial Intelligence

Grafana Labs is a company that specializes in open-source observability technologies and solutions. It offers a comprehensive suite of tools for logging, metrics, tracing, and profile management with products like Grafana, Loki, Tempo, and Mimir. Their offerings are designed to help businesses visualize, monitor, and alert on data from various sources, providing capabilities such as anomaly detection, root cause analysis, and service level objective management using AI/ML insights. Grafana Labs provides both cloud-based and self-managed solutions, ideal for infrastructure, application, and frontend observability. Additionally, their platform supports integration with various data sources like Prometheus and OpenTelemetry, making them a key player in the observability and infrastructure monitoring space.

Staff Software Engineer – Databases SRE

🔥 0 minutes ago

🇮🇪 Ireland – Remote

💵 €117.6k - €141.1k / year

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Azure

Google Cloud Platform

Java

Kubernetes

Linux

Python

Terraform

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Grafana Labs

Website LinkedIn All Job Openings

501 - 1000 employees

Founded 2014

🏢 Enterprise

☁️ SaaS

🤖 Artificial Intelligence

Enterprise • SaaS • Artificial Intelligence

📋 Description

• Partner closely with product engineering squads • Own production reliability for high-SLA and complex customer environments • Design and implement automation to scale reliability practices • Ensure customers meet SLO targets • Define and evolve per-tenant SLOs and reliability models • Proactively reduce SLO burn to prevent repeat incidents • Serve as a primary escalation point and on-call for relevant incidents • Lead customer-impacting incident response and post-incident reviews • Contribute to design docs and code reviews • Influence feature design for production scalability and operability • Build automation to eliminate toil where needed • Improve alert quality and reduce noisy escalations

🎯 Requirements

• 8+ years engineering experience, 4+ in SRE/CRE/production engineering. • Strong preference for those with formal customer reliability engineering experience. • Strong Kubernetes experience in AWS, GCP, or Azure. • Familiarity with infrastructure-as-code tooling (Helm, Terraform, Jsonnet, etc.). • Strong experience with technical leadership. • Experience operating multi-tenant systems in production. • Strong experience designing and implementing SLOs. • Experience with one or more programming languages (e.g. Go, Python, Java, etc). • Experience with Linux operating systems internals. • Excellent problem-solving and troubleshooting skills. • Experience with calmly participating in blame-free Incident Response. • Ability to reason about performance, scaling, and failure modes. • Comfortable working within an engineering team where individuals are encouraged to have a strong sense of autonomy and self-direction. • Ability to partner deeply with product engineering teams. • Intellectually curious, default to transparency, possess a high bias towards action, and kind.

🏖️ Benefits

• Equity • Bonus (if applicable) • Competitive annual leave policy of 30 days • Company-funded usage budget for AI coding assistants • In-person onboarding

Apply Now

Similar Jobs

Staff SRE, Ads

🕒 June 19

Reddit, Inc.

501 - 1000

👥 B2C

📱 Media

🌍 Social Impact

Website LinkedIn All Job Openings

Staff SRE leading reliability initiatives across Ads domains at Reddit. Mentoring engineers and improving infrastructure reliability for critical revenue-generating systems.

🇮🇪 Ireland – Remote

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

Cloud

Distributed Systems

Linux

Python

Apply

View Job

Principal DevOps Engineer

🕒 June 12

Zartis

201 - 500

☁️ SaaS

Website LinkedIn All Job Openings

🇮🇪 Ireland – Remote

💰 Pre Seed Round on 2011-12

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

Apply

View Job

Staff Cloud Operations Engineer

🕒 January 22

Extreme Networks

1001 - 5000

📡 Telecommunications

🏢 Enterprise

🔐 Security

Website LinkedIn All Job Openings

Cloud Operations Engineer at Extreme Networks building scalable cloud solutions. Collaborating on multi-cloud environments and driving operational excellence in cloud services.

🇮🇪 Ireland – Remote

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

Distributed Systems

ElasticSearch

Flux

Grafana

Kafka

Kubernetes