Senior Site Reliability Engineer

Job not on LinkedIn

November 7

Apply Now
Logo of MariaDB

MariaDB

Enterprise • Open Source • Database

MariaDB is a company that develops and provides an open-source cloud-native relational database solution. Known for its MariaDB Server and MariaDB Enterprise offerings, it delivers high availability, auto-failover capabilities, and supports both transactional and analytical workloads. MariaDB is favored for its flexibility, cost-effectiveness compared to proprietary databases, and support for various data models, including relational and JSON. It is widely used in Linux distributions as a replacement for MySQL and is popular among developers for its open-source innovation and ease of use.

201 - 500 employees

Founded 2009

🏢 Enterprise

📋 Description

• Design, implement, and evolve large-scale, cloud-native infrastructure supporting our global SaaS platform. • Lead reliability and scalability initiatives that span multiple teams and services, driving automation and resilience through infrastructure-as-code and GitOps practices. • Proactively identify and remediate systemic reliability issues, ensuring high service availability and performance across multi-cloud environments. • Collaborate with software and platform teams to integrate reliability principles, SLOs, and observability standards into every stage of the development lifecycle. • Act as a key technical leader during major incidents—coordinating response efforts, conducting root cause analysis, and implementing long-term corrective actions. • Contribute to continuous improvement by defining infrastructure patterns, refining CI/CD workflows, and mentoring other engineers in automation and reliability best practices.

🎯 Requirements

• At least 7 years of hands-on experience as an SRE, DevOps, or Infrastructure Engineer in production cloud environments. • Strong expertise with Kubernetes operations and ecosystem tooling in production-scale clusters. • Proven experience designing and maintaining multi-cloud infrastructure across Azure, AWS, or GCP. • Advanced proficiency with Terraform and Terragrunt, capable of designing modular, reusable, and secure IaC components. • Solid understanding of GitOps principles and deployment automation using ArgoCD or similar tools. • Deep experience with Linux systems administration, performance tuning, and troubleshooting. • Proficiency in one or more programming/scripting languages (Python, Bash, Go preferred). • Strong understanding of observability concepts and experience working with monitoring and alerting tools such as Prometheus, Grafana, and Thanos. • Experience participating in or leading on-call rotations, handling incident response, and conducting post-incident reviews.

🏖️ Benefits

• 25 days paid annual leave (plus holidays) • Very competitive compensation package • Health insurance • Life insurance • Disability insurance • Funds toward professional development resources • Paid holidays • Parental leave

Apply Now

Similar Jobs

November 4

Site Reliability Engineer managing AWS infrastructure and Kubernetes for client projects at Truelogic. Collaborating closely with engineering teams to ensure platform reliability and performance.

AWS

Grafana

Kafka

Kubernetes

Node.js

Prometheus

Python

Spark

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com