Senior Site Reliability Engineer

201 - 500 employees

Founded 1989

Semiconductor • Market Analysis • Technology

TechInsights is the authoritative information platform for the semiconductor industry, offering the world’s largest collection of unmatched reverse engineering, teardown, and market analysis. The company provides in-depth, actionable insights to aid businesses in making informed decisions about design, product development, and market strategies. Customers include successful technology companies that rely on TechInsights' analyses to make faster and more confident business decisions. Their services include detailed reports and insights on semiconductor technologies, sustainability, and trends such as 5G, AI, and automotive innovation.

Senior Site Reliability Engineer

🕒 May 8

🇵🇱 Poland – Remote

💵 zł18.8k - zł20k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

Docker

Java

Kubernetes

Python

Spring

Spring Boot

SpringBoot

Terraform

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

TechInsights

201 - 500 employees

Founded 1989

Semiconductor • Market Analysis • Technology

📋 Description

• Own SLOs, SLIs, and error budgets for all production services; drive error budget discipline across engineering • Design reliability patterns for AI agent pipelines: LLM observability, tool-use tracking, failure detection, and graceful degradation • Architect for blast radius containment — agent failures must have bounded customer impact through isolation, circuit breaking, and rapid recovery • Mature our Canada Central/West active-active architecture toward 24-hour RTO with full regional failover • Lead incident response and post-incident reviews that produce durable fixes; maintain DR procedures through regular testing • Serve as the primary reliability liaison to Software and AI Engineering, translating requirements into actionable standards • Partner with AI Engineering on compute provisioning, model serving, inference latency, and workload isolation • Own CI/CD pipeline strategy (Bitbucket Pipelines, GitHub Actions) — set standards, optimize deployment frequency, and ensure teams can ship confidently • Drive IDP adoption and enable teams on SRE practices: on-call readiness, SLO definition, runbook development, and self-service tooling • Represent reliability in architectural discussions; surface risk before it's committed to design • Operate Datadog as the single pane of glass for service health, infrastructure, and agentic pipeline telemetry • Extend observability to AI workloads: LLM latency, token consumption, agent completion rates, and pipeline throughput • Build golden path templates in Backstage and/or Atlassian Compass so teams ship reliably without routine SRE involvement • Own infrastructure as code via Terraform and GitOps; enforce IaC policy in partnership with Trust Assurance • Own FinOps visibility into AWS cost segments; model cloud cost impact as AI/ML workloads scale • Formally mentor junior and intermediate SRE engineers, with accountability for their technical growth and career progression • Build AI-assisted automation to progressively reduce toil and scale the team's operational capacity

🎯 Requirements

• Bachelor's degree in Computer Science, Engineering, or equivalent combination of education and experience • 6–8 years of progressive experience in site reliability engineering, platform engineering, or DevOps, with demonstrated technical leadership at the senior individual contributor level • Deep expertise in AWS (EKS, Lambda, CloudWatch, AWS Config) and multi-region architecture patterns • Proficiency with Terraform and GitOps; experience with policy-as-code (Sentinel, OPA/Rego, or equivalent) • Hands-on Datadog experience at operational depth: dashboards, SLO tracking, alerting, log management, distributed tracing • Strong containerization expertise: Docker, Kubernetes (EKS preferred) • Proficiency in Python and/or Bash; experience building operational tooling; solid understanding of Java and Spring Boot microservice architecture sufficient to make reliability and deployment decisions for EKS-hosted services • Deep expertise in CI/CD pipeline design and optimization using Bitbucket Pipelines and GitHub Actions • Familiarity with IDP tooling (Backstage, Atlassian Compass, or equivalent) strongly preferred • Experience with AI/ML workload infrastructure, LLM API integration, or agentic system operations considered a strong asset

🏖️ Benefits

• Company-sponsored training and development opportunities • Comprehensive benefits package (health, wellness, life insurance, fitness, English classes) • Flexible vacation policy • Community involvement opportunities through charitable alliances • Wellness resources and support • Inclusive environment that prioritizes diversity, equity, and accessibility • High-growth company driven by high performance

Apply Now

Similar Jobs

Site Reliability Engineer

🕒 May 8

Hewlett Packard Enterprise

10,000+ employees

🏢 Enterprise

🔧 Hardware

☁️ SaaS

Site Reliability Engineer at Hewlett Packard Enterprise managing reliable and fault-tolerant cloud systems. Leading service lifecycle and ensuring platform availability in a remote setting.

🇵🇱 Poland – Remote

💵 zł154.5k - zł305.5k / year

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

Airflow

Ansible

Apache

AWS

Cassandra

Cloud

Distributed Systems

Docker

ElasticSearch

Flux

Kafka

Kubernetes

Linux

Packer

Postgres

Python

Redis

Ruby

Spark

Terraform

Unix

Senior Site Reliability Engineer – Application Config, Deployment

🕒 April 30

Akamai Technologies

5001 - 10000

🔒 Cybersecurity

SRE Engineer designing, developing, and operating Akamai Cloud application and infrastructure. Collaborating with teams to solve complex challenges and enhance observability infrastructure.

🇵🇱 Poland – Remote

💰 Post-IPO Equity on 2001-07

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Ansible

Chef

Distributed Systems

Puppet

SaltStack

Terraform

Engineering Manager – Infrastructure, DevOps

🕒 April 28

SOFTSWISS

1001 - 5000

🎮 Gaming

Engineering Manager leading a service-oriented infrastructure team at SOFTSWISS responsible for reliability, scalability, and efficiency. Driving team development and improving engineering processes in a high-load environment.

🇵🇱 Poland – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Cloud

Kubernetes

Terraform

Reliability & Scale Engineer – DevOps/Cloud

🕒 April 23

RedSky

11 - 50

🔒 Cybersecurity

🏛️ Government

Venture Builder creating startups from the ground up at Red Sky. Join and build teams pushing boundaries across various industries.

🇵🇱 Poland – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Senior Database Reliability Engineer, Architect

🕒 April 22

CloudLinux

51 - 200

☁️ SaaS

🔐 Security

🌐 Web 3