Site Reliability Engineer

🔥 8 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of HBK - Hottinger Brüel & Kjær

HBK - Hottinger Brüel & Kjær

1001 - 5000 employees

Founded 2019

🚀 Aerospace

⚡ Energy

Aerospace • Automotive • Energy

HBK - Hottinger Brüel & Kjær is a prominent company formed in 2019 through the merger of HBM and Brüel & Kjær, two organizations with over 80 years of experience in precision test and measurement technology. HBK delivers innovative solutions across multiple domains including mechanical, sound and vibration, and electrical testing. The company provides a wide range of products and services such as data acquisition systems, electroacoustic setups, vibration testing equipment, and custom sensor assemblies. HBK caters to diverse industries such as aerospace, automotive, and energy, focusing on quality, reliability, and sustainability in all offerings. The company's mission is to empower innovators by providing exceptional sensing and insights, thus contributing to a cleaner, healthier, and more productive world.

📋 Description

• Design, build, and operate the internal developer platform (IDP), including portal and CLI interfaces backed by Python-based APIs, enabling consistent, self-service infrastructure provisioning via Crossplane abstractions • Develop and maintain cloud governance policies, procedures and standards • Offer guidance and recommendations on cloud governance best practices, with a focus on enhancing security and compliance measures • Monitor cloud spends to collate, analyze and prioritize cloud optimization recommendations to calculate potential savings • Collaborate with development teams to design, build, and maintain reliable and scalable systems • Participate in incident response processes and support production systems by triaging alerts and resolving operational issues • Define and improve service reliability metrics (SLIs/SLOs), including availability calculations using observability data (e.g., logs/metrics) • Implement and improve monitoring, alerting, and incident response processes to ensure system health and availability • Continuously identify and eliminate toil through automation (including use of GenAI where applicable) • Contribute to the development and improvement of CI/CD pipelines, deployment processes, and release strategies • Continuously improve the reliability of our systems through post-incident reviews and root cause analysis • Implement, execute and maintain Information Security Management System (ISMS) compliant with ISO 27001 standards • Stay current with industry trends, best practices, and emerging technologies related to Site Reliability Engineering

🎯 Requirements

• Proficiency in scripting and automation using tools such as Bash, Python & Go • Expertise with CI/CD tools (e.g., GitHub Actions, TeamCity, Jenkins) • Knowledge of infrastructure-as-code and control plane technologies such as Terraform, Pulumi, and Crossplane (including composition-based abstractions) • Expertise with containerization technologies (Docker, Kubernetes) • Experience with cloud platforms (AWS, Azure, GCP) • Experience leveraging GenAI tools (e.g., GitHub Copilot, ChatGPT) to accelerate development and automation workflows • Strong knowledge of SRE and DevOps principles, practices, and methodologies • Experience in monitoring and observability tools (e.g., ELK, Grafana – Prometheus, Tempo, Loki) • Experience building platform services using Python (APIs, CLI tools, or developer portals) • Experience with Internal Developer Platforms (IDP) or self-service infrastructure platforms • Understanding of platform engineering and developer experience (DevEx) principles • Nice to have: Experience with DevSecOps, Threat Modelling. • Familiarity with incident response and post-incident analysis processes • Strong troubleshooting and problem-solving skills • Ability to work independently while also being a team player • Experience working in an agile environment • Actively propagate the SRE mindset by fostering a culture of reliability, automation, collaboration, and continuous improvement.

🏖️ Benefits

• Work From Anywhere (WFA) : We are a workplace that values work-life balance, provides flexible working hours, and provides full-time WFA option. • We provide attractive Health Insurance and Vacation benefits for our employees

Apply Now

Similar Jobs

🔥 6 hours ago

Akamai Technologies

5001 - 10000

🔒 Cybersecurity

Release Engineer developing and maintaining cloud infrastructures and web applications aligned with FedRAMP standards. Collaborating globally on advanced network security solutions with skilled colleagues.

AWS

Azure

Cloud

Docker

Google Cloud Platform

Jenkins

Kubernetes

Linux

Microservices

Python

Terraform

🔥 7 hours ago

Elfonze Technologies

201 - 500

🏢 Enterprise

☁️ SaaS

🤝 B2B

DevOps Engineer at Elfonze Technologies designing and managing Kubernetes clusters and API management solutions. Involving Infrastructure as Code implementation and incident response

Cloud

Kubernetes

Terraform

🕒 2 days ago

Solvative

51 - 200

🤝 B2B

☁️ SaaS

🛍️ eCommerce

Hands-on Senior Engineer managing platform security and compliance for Adobe Experience Manager (AEM) Cloud Service. Collaborates with security and engineering teams on secure DevOps practices.

AEM

Cloud

Java

JavaScript

Jenkins

Python

🕒 4 days ago

Akamai Technologies

5001 - 10000

🔒 Cybersecurity

Senior Release Engineer at Akamai developing tools and automation for deployment releases. Collaborating with teams to manage engineering and release plans in a digital experience environment.

Grafana

Jenkins

Python

🕒 4 days ago

Solace

501 - 1000

🔌 API

☁️ SaaS

📡 Telecommunications

Senior Cloud Site Reliability Engineer managing Solace Cloud operations and ensuring reliability across multi-cloud environments. Collaborating with customers to resolve operational issues in real-time.

AWS

Azure

Cloud

Google Cloud Platform

Groovy

Kubernetes

Linux

Prometheus

Python

Terraform

Go