Senior SRE Engineer – Observability Focus

501 - 1000 employees

We are on a mission to make the world of finance more accessible, engaging and useful.

Senior SRE Engineer – Observability Focus

🔥 0 minutes ago

🇵🇱 Poland – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Ansible

AWS

ElasticSearch

Grafana

Java

JavaScript

Kafka

Kubernetes

Prometheus

Python

Terraform

TypeScript

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Capital.com

501 - 1000 employees

We are on a mission to make the world of finance more accessible, engaging and useful.

📋 Description

• Own the full observability stack: metrics (VictoriaMetrics), logs (OpenSearch), and traces (OpenTelemetry) — from pipeline design to day-2 operations. • Architect and run VictoriaMetrics cluster topology (vmstorage/vminsert/vmselect), including vmagent scraping, remote write configuration, vmalert rules, and cardinality control. • Operate OpenSearch clusters: index lifecycle management (ISM), hot-warm-cold architecture, shard tuning, and ingest pipelines via Data Prepper. • Build and maintain OTEL Collector pipelines — receivers, processors, exporters — and instrument services across Java, Python, and JS/TS stacks (auto and manual). • Run Kafka as the telemetry transport layer (OTEL Collector → Kafka → backends), including topic design, partition strategy, consumer group lag monitoring, and throughput tuning for high-volume telemetry. • Manage log shipping infrastructure using Fluent Bit, Vector, or Fluentd; define structured logging standards and field normalization across services. • Build Grafana dashboards and alerting that engineers actually use — clear, actionable, with well-structured variables and thresholds. • Work with platform and application teams to improve sampling strategies (head/tail), batching, and context propagation across distributed services. • Contribute to incident response, post-mortems, and reliability improvements driven by observability signals. • Mentor engineers on observability practices, tooling, and structured logging standards.

🎯 Requirements

• 6+ years in a DevOps, SRE, or platform engineering role, with at least 2 years focused on observability tooling at production scale. • Deep hands-on experience with VictoriaMetrics (or Prometheus) — MetricsQL/PromQL, exporters, service discovery, remote write, downsampling, and retention management. • Solid OpenSearch or Elasticsearch skills: cluster operations, Query DSL, ISM policies, and ingest pipeline design. • Production experience with OpenTelemetry: Collector configuration, OTLP, context propagation, and instrumentation across multiple languages. • Strong Kafka skills — producer/consumer patterns, consumer group management, Kafka Connect, Schema Registry, and JMX-based monitoring. Strimzi experience a plus if you've run Kafka on Kubernetes. • Proficiency with log shippers (Fluent Bit, Vector, Fluentd) and structured log parsing/normalization. • Working knowledge of Kubernetes (operators, Helm), Argo CD/GitOps, and Terraform/Ansible. • Comfortable in a hybrid AWS + on-prem environment; solid understanding of networking as it applies to scraping and shipping pipelines. • Scripting ability in Bash or Python for automation and tooling. • Strong communication skills — you can explain observability tradeoffs clearly to engineers and non-engineers alike. • English proficiency.

🏖️ Benefits

• Competitive Salary: We believe great work deserves great pay! Your skills and talents will be rewarded with a salary that makes you feel valued and motivated. • Work-Life Harmony: Join a company that genuinely cares about you - because your life outside of work matters just as much as your time on the clock. #LI-Hybrid • Generous Time Off: Need a breather? Our annual leave policy lets you recharge and enjoy life outside of work without a worry. • Employee Referral Program: Love working here? Share the love! Bring your talented friends on board and get rewarded for growing our awesome team. • Comprehensive Health & Pension Benefits: From medical insurance to pension plans, we’ve got your back. Plus, location-specific benefits and perks! • Workation Wonderland: Live your digital nomad dreams with 30 extra days to work remotely from anywhere in the world (some restrictions apply). Adventure awaits! • Volunteer Days: Make a difference! Take two additional paid days each year to support causes you care about and give back to the community.

Apply Now

Similar Jobs

Senior AI DevOps, LLMOps

🔥 7 hours ago

TechBiz Global

51 - 200

🎯 Recruiter

Senior AI DevOps / LLMOps specialist at TechBiz Global, designing robust CI/CD pipelines for AI and managing high-performance compute environments.

🇵🇱 Poland – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Cloud

Kubernetes

Ray

Terraform

Vault

Systems/DevOps Engineer

🕒 Yesterday

SOFTSWISS

1001 - 5000

🎮 Gaming

Systems Engineer/DevOps Engineer at SOFTSWISS designing, building, and scaling cloud infrastructure and CI/CD ecosystems. Collaborating with global teams to enhance system reliability and performance.

🇵🇱 Poland – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗣️🇷🇺 Russian Required

Ansible

Chef

Distributed Systems

Kubernetes

Linux

Postgres

Puppet

Python

SaltStack

Senior DevOps, AWS

🕒 Yesterday

intive

1001 - 5000

🤖 Artificial Intelligence

Senior AWS DevOps Engineer at intive responsible for building resilient cloud infrastructure. Collaborating on a 6-month transformation initiative focused on AWS network environment simplification.

🇵🇱 Poland – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

Docker

Jenkins

Linux

MySQL

Postgres

Python

SQL

Terraform

Senior DevOps Engineer

🕒 3 days ago

Huzzle.com

51 - 200

🎯 Recruiter

👥 HR Tech

☁️ SaaS

Senior DevOps Engineer leading infrastructure automation initiatives for international technology companies. Collaborating with software engineers to improve workflows and optimize performance in cloud-native systems.

🇵🇱 Poland – Remote

💰 $1.8M Pre Seed Round - Huzzle on 2024-04

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Azure

Cloud

Distributed Systems

Docker

Google Cloud Platform

Grafana

JavaScript

Jenkins

Kubernetes

Linux

Prometheus

Python

Terraform

Senior DevOps/SRE Engineer

🕒 4 days ago

Bonapolia

11 - 50

🎯 Recruiter

🤝 B2B