Senior Site Reliability Engineer

Job not on LinkedIn

🔥 11 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of OpenCV

OpenCV

1 - 10 employees

Founded 2000

🤖 Artificial Intelligence

📚 Education

☁️ SaaS

💰 $325.3k Equity Crowdfunding on 2019-05

Artificial Intelligence • Education • SaaS

OpenCV is the world's largest open-source computer vision library, operated by the non-profit Open Source Vision Foundation. It offers a wide array of services and resources, including OpenCV University, which provides courses on computer vision, deep learning, and AI. The OpenCV library is highly optimized for real-time applications and supports multiple programming languages such as C++, Python, and Java, across various platforms, including Linux, MacOS, Windows, iOS, and Android. OpenCV also provides consulting services through OpenCV. AI, offering market-leading computer vision solutions, and boasts top-rated face recognition technology. It is widely used for image and video processing, with over 2500 algorithms available for free commercial use under the Apache 2 License.

📋 Description

• Building Operational Automation: Design, build, and evolve the automation framework and tooling that powers the MC platform — primarily in Golang — with a strong focus on maintainability, scalability, and reliability • Developing Self-Service APIs: Build and maintain the service APIs and self-service operational tooling of the MC platform that enable customers and teams to safely and efficiently operate services in production without manual intervention • Applying Site Reliability Engineering (SRE) Principles: Define, implement, and continuously improve Service Level Indicators (SLIs), Service Level Objectives (SLOs), error budgets, and SLA measurements so that reliability is measurable and actionable across the MC platform and the services it operates • Owning Reliability and Operations Initiatives: Take ownership of reliability, automation, and Mission Control projects, driving them independently from problem identification through implementation and long-term operation • Collaborating with AI Engineering: Work closely with our AI team and tooling, integrating AI-assisted capabilities into our automation and operational workflows • Incident Response and Learning: Participate in incident response and the on-call rotation, leading root cause analysis and driving sustainable corrective and preventive actions

🎯 Requirements

• Strong software engineering background, ideally with production-grade Go (Golang) experience • Solid understanding of distributed systems and scalable architecture • Proven experience designing, building, and operating services and their APIs (e.g. REST, gRPC) in production • Experience operating production systems, including incident response, on-call, and root cause analysis • Experience with SRE, DevOps, platform engineering, or reliability-focused roles • Hands-on experience with infrastructure and operations tooling, such as: Kubernetes, Terraform / Infrastructure as Code, GitOps principles and CI/CD tooling, Prometheus, Loki, Tempo, and modern observability stacks • Major cloud platforms (AWS, Azure, GCP) • Knowledge of Linux system administration, networking concepts, and major Internet protocols (TCP/IP, IPsec, SSL, SSH, SMTP, HTTPS, DNS) • Ability to think in terms of systems, failure modes, and trade-offs • Strong communication skills and the ability to build trust across engineering and operations teams • A proactive mindset: you see problems, propose solutions, and take responsibility for delivering them • University degree in Computer Science or similar educational level

🏖️ Benefits

• Health insurance • Professional development opportunities

Apply Now

Similar Jobs

🔥 5 hours ago

birkle IT

51 - 200

Senior Azure Cloud Architect managing critical cloud architecture with responsibilities in Kubernetes and Security. Focused on Azure platforms, Terraform, and DevOps practices for enterprise clients.

🗣️🇩🇪 German Required

Ansible

Azure

Cloud

Docker

Grafana

Java

Kafka

Kubernetes

Microservices

Prometheus

Spring

Spring Boot

SpringBoot

SQL

Terraform

Vault

🕒 Yesterday

ARES Consulting GmbH

11 - 50

🤝 B2B

Team Lead managing a skilled team of Cloud & DevOps Engineers with 100% remote work in Germany. Focus on client projects, team development, and organizational growth.

🗣️🇩🇪 German Required

AWS

Azure

Cloud

Google Cloud Platform

Kubernetes

🕒 Yesterday

CopeCart

51 - 200

SRE / DevOps Engineer improving deployment processes and operational efficiency for CopeCart. Focused on AWS environments and agentic engineering practices.

AWS

Kubernetes

Linux

Ruby

Terraform

TypeScript

🕒 2 days ago

Digistore24 USA

51 - 200

☁️ SaaS

🛍️ eCommerce

🏪 Marketplace

DevOps Generalist responsible for automation and performance improvement in Digistore24. Collaborating with teams to enhance system reliability and incident response.

🗣️🇩🇪 German Required

Cloud

Kubernetes

PHP

🕒 2 days ago

Dataciders QuinScape GmbH

201 - 500

🤖 Artificial Intelligence

🏢 Enterprise

☁️ SaaS

Senior DevOps Engineer responsible for cloud environments and application support at Dataciders, focusing on development, automation, and optimization.

🗣️🇩🇪 German Required

AWS

Azure

Cloud

Docker

Google Cloud Platform

Kubernetes

Linux