Senior Site Reliability Engineer - Manager

Job not on LinkedIn

February 12

Apply Now
Logo of RemoteStar

RemoteStar

B2B • Recruitment • SaaS

RemoteStar is a global recruitment service that specializes in hiring top-quality tech talent. By assembling diverse teams with vetted developers from various regions, RemoteStar ensures high-quality staffing while maximizing cost efficiency for companies. The service includes a rigorous vetting process, technical matching, and full onboarding support, allowing businesses to focus on their core operations while RemoteStar handles the administrative aspects of recruitment and team management.

11 - 50 employees

Founded 2020

🤝 B2B

🎯 Recruiter

☁️ SaaS

📋 Description

• RemoteStar is looking to hire a Senior Site Reliability Engineering Manager on behalf of our client based in the UK with a fully remote work policy. • As the SRE Manager, you will play a critical role in ensuring the reliability, scalability, and performance of our infrastructure and services through both direct technical contribution along with team building and management. • Take full ownership of the production estate from both a technical and process perspective. • Provide a consistent smooth operation of live systems and drive all on-call support issues. • Design and operate a new incident tracking process to ensure root causes are found and remediated in a timely fashion by the development team. • Create and maintain high end monitoring and automation tooling. • Drive automation initiatives to streamline operational workflows and improve efficiency. • Develop and maintain tools, scripts, and dashboards to monitor system health, performance, and reliability. • Build a first class SRE team through a combination of leading by example, coaching and mentoring.

🎯 Requirements

• Proven experience in a senior or lead SRE role, with a strong track record of building and maintaining highly reliable infrastructure and services. • Expertise in incident management, including incident response, resolution, and post-mortem analysis. • Proficiency in monitoring, alerting, and observability tools such as Prometheus, Grafana, ELK stack or Datadog. • Experience with cloud platforms such as AWS, Azure, or GCP, including infrastructure as code tools like Terraform or CloudFormation. • Strong scripting and automation skills, with proficiency in languages such as Python, Bash, or Go. • Excellent communication and collaboration skills, with the ability to work effectively with cross-functional teams in a remote environment. • Demonstrated leadership capabilities, with a passion for mentoring and developing team members.

🏖️ Benefits

• Dynamic working environment in an extremely fast-growing company • Work in an international environment • Work in a pleasant environment with very little hierarchy • Intellectually challenging, play a massive role in client’s success and scalability • Flexible working hours

Apply Now

Similar Jobs

February 11

Join Prima as an SRE, ensuring reliability and performance while supporting software teams in cloud operations.

AWS

Cloud

DNS

Elixir

Kafka

Kubernetes

Microservices

Postgres

Python

RabbitMQ

Redis

Rust

Terraform

February 8

Join a telecoms software company as a Site Reliability Engineer ensuring system performance and reliability.

Cloud

Grafana

Kubernetes

Prometheus

Go

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com