Site Reliability Engineer

October 31

Apply Now
Logo of Capgemini

Capgemini

Enterprise • Artificial Intelligence • Cybersecurity

Capgemini is a global leader in partnering with businesses to transform and manage their operations by harnessing the power of technology. With expertise across a wide array of industries such as aerospace, automotive, banking, and healthcare, Capgemini provides a constantly evolving portfolio of services to meet the ever-changing needs of their clients. Their offerings include cloud, cybersecurity, data and artificial intelligence, and enterprise management, among others. Capgemini also emphasizes innovation and sustainability, helping companies achieve digital transformation while promoting environmental and social responsibility. Additionally, Capgemini provides career opportunities across various levels and professions, encouraging innovation and diversity in its workforce.

📋 Description

• Expert troubleshooter within IT who has broad technical experience in multiple disciplines of IT and is willing to help our Incident and Problem Management teams • Understand root cause and the necessary tasks needed to ensure this incident does not recur. • Validate root cause of incidents in nonproduction regions, ensuring that the cause is validated and then work with teams to determine the best approach to resolve. • Participate in chaos testing - where we leverage a third-party tool to disable functions on a server and we verify that we can alert teams to the failure and then assemble a technical troubleshooting call to identify and restore the service. • Leverage Observability tools set to define key transactions and observe their performance within systems • Create golden signal reporting and error budgets for development teams. Must know the framework • Perform failure analysis, leveraging chaos testing practices to break nonproduction systems to find weak points and work with infrastructure and development teams to improve the applications resilience.

🎯 Requirements

• At least 6 years of experience in a similar role as a Reliability Engineer or Resilience Engineer • Full English Fluency • BS in Computer Science or similar • Very strong experience using Code (writing, testing leveraging observability process) Ideally JAVA, C++ • Hands on approach, troubleshooting, very technical background. • **Technical & Business Skills** • - Site Reliability Engineer - Advanced • - Trend & Pattern Analysis – Advanced, Optimization, • - Resilience Engineering – Advanced • - Golden Signal Cyber Reliability **(MUST)** • - Dynatrace - Intermediate (4-6 Years) **Desirable, not a must,** any other Observabilty tool • - Gremlin - Entry Level (1-3 Years) Chaos testing, Failure modeling experience or similiar **(Very Desirable)** • - Cloud Infrastructure, Experience: AWS / Azure / GCP - Intermediate (4-6 Years) • - Strong Coding experience

🏖️ Benefits

• Competitive salary and performance-based bonuses • Comprehensive benefits package • Career development and training opportunities • Flexible work arrangements (remote and/or office-based) • Dynamic and inclusive work culture within a globally renowned group • Private Health Insurance • Pension Plan • Paid Time Off • Training & Development

Apply Now

Similar Jobs

October 31

Arrow Components

10,000+ employees

Senior DevOps Engineer at Arrow Electronics automating application and infrastructure delivery. Collaborating with global teams to design processes and workflows to improve time to market.

Ansible

Azure

Cloud

Kubernetes

Linux

Microservices

Packer

Perl

Terraform

October 29

Software Support Engineer Level 2 at Axented, working with AWS and DevOps tools. Collaborating with cross-functional teams in real-time to solve critical technical issues.

🗣️🇪🇸 Spanish Required

AWS

Cloud

Docker

EC2

Google Cloud Platform

Kubernetes

Prometheus

Python

Terraform

TypeScript

October 13

DevOps Engineer at BayRock Labs managing CI/CD pipelines and cloud infrastructure using AWS. Collaborating with teams to optimize system performance and maintain documentation for processes.

AWS

Cloud

Docker

EC2

Grafana

Jenkins

Kubernetes

Microservices

Prometheus

Python

Terraform

September 26

DevOps Engineer remoto en Dresden Partners, diseñando pipelines CI/CD e administrando infra AWS y Kubernetes para clientes nearshore.

🗣️🇪🇸 Spanish Required

Ansible

AWS

Chef

Docker

Java

Jenkins

Kubernetes

Linux

Puppet

Python

Ruby

Unix

September 9

Operations Engineer building automation and operating Lucidworks' managed Fusion cloud platform. Ensure 99.9% uptime, monitoring, deployments, and customer onboarding automation.

Cloud

Cyber Security

Distributed Systems

Google Cloud Platform

Kubernetes

Linux

Python

Terraform

Go

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com