Senior Site Reliability Engineer, SRE

September 14

Apply Now
Logo of OutSystems

OutSystems

Enterprise • Productivity • SaaS

OutSystems is a software company that provides a low-code application development platform. It allows organizations to develop, deploy, and manage enterprise-grade applications with minimal coding effort. By simplifying the process of application development, OutSystems helps businesses accelerate their digital transformation and improve productivity.

1001 - 5000 employees

Founded 2001

🏢 Enterprise

⚡ Productivity

☁️ SaaS

📋 Description

• Lead and onboard services and teams to the reliability tenets; • Establish and maintain Service Level Objectives (SLOs) and Service Level Agreements (SLAs); • Design and implement scalable, reliable, and secure infrastructure, while ensuring cloud-native best practices; • Collaborate with software development teams to ensure systems are resilient (observable, fault-tolerant, recoverable, scalable) and performant; • Implement monitoring, alerting, logging, and tracing solutions to detect and respond to incidents; • Lead incident response efforts, ensuring quick resolution and minimal downtime, and conduct RCA/post-mortems; • Automate every operational task, with a special focus on fast incident detection & recovery; • Foster a culture of continuous improvement and knowledge sharing; • Communicate effectively with stakeholders, providing updates on system reliability and performance; • Participate in on-call rotation to provide 24/7 support for production systems.

🎯 Requirements

• STEM degree (BSc, MSc, in Software Engineering/Computer Science or related fields); • 5+ years of experience in software development and/or operations; • Proficiency in at least one high-level programming language (C++, Python, Java, C#, etc.). • Strong troubleshooting and debugging skills. • Fluency in English and excellent communication skills. • Experience in any of the following is valued, but not fully required: Containerization technologies and orchestration platforms, mainly Kubernetes (CKA, CKAD, CKS certifications are valued); Experience with automation and Infrastructure as Code (IaC) tools, such as AWS CloudFormation, Terraform, Puppet, Chef, Spacelift, etc; Experience with Python, Go, Bash/Shell scripting, or other automation tools/languages; Familiarity with AWS services like EC2, RDS, ELB, CloudFront, Lambda, etc; Proficiency in monitoring and troubleshooting complex distributed systems; Experience with Grafana, ELK stack, Prometheus, or others; Strong understanding of designing resilient and fault-tolerant systems; Expertise in debugging complex distributed systems.

🏖️ Benefits

• A company that is always growing, changing, and innovating. • Real career opportunities. • Work colleagues that are as smart, hard-working, and driven as you. • Disrupting the status quo is in our DNA.

Apply Now

Similar Jobs

September 5

DevOps Engineer operating and hardening AWS infrastructure for Smart Working. Leading deployments, automation, observability, incident response, and vulnerability management.

AWS

Cloud

Docker

EC2

Kubernetes

SDLC

Terraform

September 5

Site Reliability Engineer scaling and automating Motive's AWS infrastructure for fleet operations. Ensuring high availability, monitoring, and deployment pipelines for customer-facing systems.

Amazon Redshift

Ansible

AWS

Chef

Distributed Systems

DynamoDB

Python

Ruby

Terraform

Go

September 2

Senior SRE at NVIDIA DGX Cloud operating GPU-accelerated Kubernetes clusters across major clouds. Ensuring reliability, observability, and incident response for production AI infrastructure.

Ansible

AWS

Azure

Chef

Cloud

Google Cloud Platform

Grafana

Kubernetes

Linux

Microservices

Prometheus

Puppet

Python

Splunk

TCP/IP

Terraform

Go

August 28

DevOps Engineer at Saaf Finance builds AI-driven mortgage infrastructure. Designs and maintains AWS-based platforms and CI/CD pipelines.

Airflow

AWS

Cloud

ETL

JavaScript

Kubernetes

Node.js

Prometheus

Python

Terraform

August 27

DevOps Engineer supporting a company building scalable 3D AEC applications. Manage Azure infrastructure, CI/CD, containers, monitoring, and deployment automation.

Azure

Cloud

Docker

Grafana

Kubernetes

Linux

MongoDB

NGINX

Prometheus

Python

RabbitMQ

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com