Site Reliability Engineer

November 10

Apply Now
Logo of Axelerant

Axelerant

Digital Marketing • SaaS • AI

Axelerant is a digital experience agency that blends AI with human creativity to deliver transformational outcomes for their clients. They offer services in digital experience platforms, digital engineering, experience design, digital marketing, intelligent automation, and quality engineering. Axelerant focuses on driving value at the intersection of people and digital experiences through people transformation, learning and development, and people care. They have been involved in various projects, including digital transformations and the development of digital experience platforms for clients like Doctors Without Borders and OHCHR. org. Their approach emphasizes seamless and scalable solutions, aiming to create value-driven solutions aligned with their clients' vision.

51 - 200 employees

☁️ SaaS

📋 Description

• Design and implement reliable and scalable infrastructure to support business-critical applications and services. • Collaborate with cross-functional teams to define and implement service level objectives (SLOs) and monitor key performance indicators (KPIs). • Develop and manage Infrastructure as Code (IaC) solutions using tools like Terraform and Ansible. • Automate repetitive operational tasks to enhance efficiency and reduce manual intervention. • Troubleshoot and resolve system performance issues to minimize downtime and ensure high availability. • Drive the adoption of cloud-native technologies and best practices. • Participate in an on-call rotation to ensure prompt resolution of critical incidents and maintain system availability. • Manage and keep documentation and runbooks up to date to ensure effective incident response and operational continuity. • Implement robust monitoring, logging, and alerting systems to proactively identify and resolve issues, and set up and leverage observability tools to ensure the platform operates as expected. • Deploy and manage workloads on container orchestration systems like Kubernetes. • Ensure security and compliance standards are integrated into the infrastructure.

🎯 Requirements

• Proven experience as a Site Reliability Engineer, with 3-4 years of experience and a strong track record of designing and implementing large-scale data solutions. • Proficiency in Infrastructure as Code (IaC) tools like Terraform and Ansible. • Experience with container orchestration platforms such as Kubernetes, including deployment and management. • Strong knowledge of Linux operating systems, including administration and optimization. • Experience setting up and implementing workload management and deployment using GitOps tools like ArgoCD. • Familiarity with monitoring and observability tools like Prometheus, Grafana, or Datadog. • Solid understanding of networking concepts, load balancers, and distributed systems. • Experience with scripting and automation using languages like Python, Bash, or Go. • Knowledge of CI/CD pipelines and tools like Jenkins, GitLab CI, or CircleCI. • Strong problem-solving and troubleshooting skills with a proactive mindset. • Excellent communication skills to collaborate with technical and non-technical stakeholders. • Certification in AWS or a similar cloud provider, with hands-on experience managing cloud infrastructure. • **Good To Have ** • Experience with multi-cloud architectures. • Understanding of serverless architectures and tools. • Experience with disaster recovery planning and implementation. • Knowledge of machine learning workflows and data pipelines.

🏖️ Benefits

• Be part of an **AI-first, remote-first** digital agency that’s shaping the future of customer experiences. • Collaborate with global teams and leading platform partners to solve meaningful challenges. • Enjoy a culture that supports autonomy, continuous learning, and work-life harmony.

Apply Now

Similar Jobs

November 9

DevOps Engineer developing and maintaining CI/CD pipelines and cloud infrastructure for security solutions. Collaborating with developers for smooth deployments and high reliability.

Ansible

AWS

Azure

Cloud

Docker

Google Cloud Platform

Grafana

JavaScript

Jenkins

Kubernetes

Linux

Node.js

Prometheus

Python

React

Terraform

November 7

DevOps Engineer for Weekday's clients enhancing CI/CD, observability, and cloud-native infrastructure. Collaborating with teams to streamline deployments and drive automation across environments.

Ansible

Cloud

Docker

Kubernetes

Terraform

November 7

DevOps Engineer managing cloud infrastructure for Jeeves, a fintech company providing corporate cards and payment solutions. Involves automation, monitoring, and security responsibilities.

Cloud

Distributed Systems

Groovy

Jenkins

Kubernetes

Microservices

Terraform

November 5

Senior Site Reliability Engineer ensuring the reliability and performance of Granicus’ services. Lead efforts in maintaining infrastructure and automating processes.

Ansible

AWS

Azure

Chef

Cloud

Distributed Systems

Docker

Grafana

Java

Kubernetes

Linux

NoSQL

Prometheus

Puppet

Python

Ruby

Splunk

SQL

Terraform

Unix

Go

November 5

CSG

5001 - 10000

Operational Engineer II responsible for managing Ascendon cloud-based billing platform operations. Collaborating with global teams and mentoring juniors in a dynamic DevOps environment.

AWS

Azure

Cloud

Postgres

Python

SQL

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com