Senior Site Reliability Engineer - Manager

Job not on LinkedIn

February 12

Apply Now
Logo of RemoteStar

RemoteStar

B2B • Recruitment • SaaS

RemoteStar is a global recruitment service that specializes in hiring top-quality tech talent. By assembling diverse teams with vetted developers from various regions, RemoteStar ensures high-quality staffing while maximizing cost efficiency for companies. The service includes a rigorous vetting process, technical matching, and full onboarding support, allowing businesses to focus on their core operations while RemoteStar handles the administrative aspects of recruitment and team management.

11 - 50 employees

Founded 2020

🤝 B2B

🎯 Recruiter

☁️ SaaS

📋 Description

• RemoteStar is looking to hire a Senior Site Reliability Engineering Manager on behalf of our client based in the UK with a fully remote work policy. • As the SRE Manager, you will play a critical role in ensuring the reliability, scalability, and performance of our infrastructure and services through both direct technical contribution along with team building and management. • Take full ownership of the production estate from both a technical and process perspective. • Provide a consistent smooth operation of live systems and drive all on-call support issues. • Design and operate a new incident tracking process to ensure root causes are found and remediated in a timely fashion by the development team. • Create and maintain high end monitoring and automation tooling. • Drive automation initiatives to streamline operational workflows and improve efficiency. • Develop and maintain tools, scripts, and dashboards to monitor system health, performance, and reliability. • Build a first class SRE team through a combination of leading by example, coaching and mentoring.

🎯 Requirements

• Proven experience in a senior or lead SRE role, with a strong track record of building and maintaining highly reliable infrastructure and services. • Expertise in incident management, including incident response, resolution, and post-mortem analysis. • Proficiency in monitoring, alerting, and observability tools such as Prometheus, Grafana, ELK stack or Datadog. • Experience with cloud platforms such as AWS, Azure, or GCP, including infrastructure as code tools like Terraform or CloudFormation. • Strong scripting and automation skills, with proficiency in languages such as Python, Bash, or Go. • Excellent communication and collaboration skills, with the ability to work effectively with cross-functional teams in a remote environment. • Demonstrated leadership capabilities, with a passion for mentoring and developing team members.

🏖️ Benefits

• Dynamic working environment in an extremely fast-growing company • Work in an international environment • Work in a pleasant environment with very little hierarchy • Intellectually challenging, play a massive role in client’s success and scalability • Flexible working hours

Apply Now

Similar Jobs

February 11

Prima Power

1001 - 5000

🚀 Aerospace

Join Prima as an SRE, ensuring reliability and performance while supporting software teams in cloud operations.

February 8

Keywords Studios

10,000+ employees

🎮 Gaming

📱 Media

🤖 Artificial Intelligence

Join Keywords Studios as an Azure Specialist. Support Azure services and lead projects in a remote position.

February 8

J BANDY CONSULTING LTD

2 - 10

🎯 Recruiter

📡 Telecommunications

Join a telecoms software company as a Site Reliability Engineer ensuring system performance and reliability.

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com