Site Reliability Engineer

November 19

Apply Now
Logo of Eneba

Eneba

eCommerce • Gaming • Marketplace

Eneba is an online platform that offers a wide range of video games, gaming-related eCards, and gift cards. Users can easily purchase cheap games for various platforms, including PC and gaming consoles, and benefit from personalized game deals through a loyalty program. Eneba also provides an opportunity for businesses to sell their products on the platform, creating a marketplace for gamers and developers.

201 - 500 employees

Founded 2018

🛍️ eCommerce

🎮 Gaming

🏪 Marketplace

📋 Description

• Own and evolve our observability stack across metrics, logs, and tracing using Prometheus CRDs, Thanos, Alertmanager, Loki, Sentry, Grafana, and supporting AWS services. • Improve system reliability by designing, implementing, and maintaining SLIs, SLOs, and error budgets, ensuring our services meet reliability objectives. • Enhance system visibility, enabling teams to proactively detect issues, reduce MTTR, and improve incident response workflows. • Build internal self-service capabilities for metrics, alerts, dashboards, and instrumentation to empower development teams. • Tune and optimize the Thanos stack, improving query performance, cache effectiveness, retention policies, and cost efficiency. • Extend and maintain monitoring Helm charts, Prometheus rules, exporters, and dashboards-as-code. • Collaborate with Backend, DevOps, and Platform teams to ensure reliability and observability are built into services from the design phase. • Support incident investigations, help pinpoint root causes, correlate metrics/logs/traces, and contribute to blameless postmortems. • Maintain observability cost efficiency, reducing waste through retention strategy, metric cardinality tuning, and performance improvements. • Keep the monitoring stack healthy and up to date, ensuring reliability, security, and alignment with best practices.

🎯 Requirements

• Hands-on experience with production observability systems, especially Prometheus, Alertmanager, Grafana, and log/trace platforms like Elasticsearch, Loki, Sentry, or their equivalents. • Experience with Thanos or large-scale metrics systems, including tuning, caching strategies, and long-term storage. • Strong understanding of SLIs, SLOs, error budgets, MTTR, reliability patterns, and incident response workflows. • Solid experience with Kubernetes in production and deep understanding of how to monitor it (exporters, node metrics, service mesh signals). • Proficiency with Infrastructure as Code (Terraform preferred) and automation best practices. • Experience with AWS monitoring, scaling, and distributed cloud resource observability. • Proficiency in scripting or programming (Go, Python, or Bash) to build automation and tooling. • Ability to reason about distributed systems failures, correlate signals, and guide teams through root-cause analysis. • Strong ownership mindset, excellent communication, and eagerness to collaborate with development teams.

🏖️ Benefits

• Opportunity to join our Employee Stock Options program. • Opportunity to help scale a unique product. • Various bonus systems: performance-based, referral, additional paid leave, personal learning budget. • Paid volunteering opportunities. • Work location of your choice: office, remote, opportunity to work and travel. • Personal and professional growth at an exponential rate supported by well-defined feedback and promotion processes.

Apply Now

Similar Jobs

October 30

career.io

51 - 200

🏢 Enterprise

☁️ SaaS

GCP Cloud DevOps Engineer focusing on designing and implementing Google Cloud solutions for telecommunications. Collaborating to maintain scalable and secure cloud operations in a new team.

🇱🇹 Lithuania – Remote

💵 €3k - €5k / month

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

October 29

career.io

51 - 200

🏢 Enterprise

☁️ SaaS

GCP Cloud Architect/DevOps Engineer designing Google Cloud solutions in the telecommunications sector. Implementing GCP landing zones and collaborating on Agile practices.

🇱🇹 Lithuania – Remote

💵 €3k - €5k / month

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com