Senior Site Reliability Engineer

Job not on LinkedIn

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Moniepoint Inc. (Formerly TeamApt Inc.)

Moniepoint Inc. (Formerly TeamApt Inc.)

1001 - 5000 employees

💳 Fintech

🏦 Banking

Fintech • Banking • Payments

Moniepoint Inc. is Africa's all-in-one financial ecosystem that provides seamless solutions in payments, banking, credit, and business management for over 10 million businesses and individuals. Operating as Nigeria's largest merchant acquirer, Moniepoint powers the majority of Point of Sale (POS) transactions in the country. The company processes $17 billion monthly while ensuring profitable operations. With operations starting in 2019, Moniepoint continues to support businesses through its comprehensive financial services platform, making significant strides in financial inclusion across emerging markets.

📋 Description

• Participate in on-call rotations as the primary technical lead. Act as the Incident Commander during major severity incidents: initiating war rooms, coordinating cross-functional teams, and providing clear status updates. • Instrument code to expose high-cardinality metrics and distributed traces. Collaboratively define, measure, and defend Service Level Objectives (SLOs) and Error Budgets with product owners. • Write high-quality, production-ready code (in Java, Go, or Python) to build internal tooling, automation platforms, and self-healing mechanisms that eliminate manual operator intervention. • Partner with Product Engineering teams during the design phase to ensure new services are built with reliability, scalability, and observability patterns (circuit breakers, rate limiting, backpressure, fallback strategies) from day one. • Analyze system performance and traffic patterns to model future capacity needs. Conduct load testing and chaos engineering experiments to verify system resilience under failure conditions.

🎯 Requirements

• Minimum of 5 years of experience in SRE or Backend Engineering with a strong ability to write clean, performant, and tested code in Java, Go, Rust, or Python. • Deep understanding of distributed systems architecture and design patterns. You possess a strong command of microservices fundamentals, event-driven architectures, and the underlying principles required to build systems that scale. • Extensive experience with Google Cloud Platform (GCP) or similar cloud providers (AWS/Azure). You are proficient in running production workloads on Kubernetes (GKE/EKS) and troubleshooting cluster/infrastructure issues. • Experience designing observability strategies using OpenTelemetry, Prometheus, New Relic, Datadog, or SigNoz to improve system visibility. • Familiarity with operating and tuning production data stores (e.g., PostgreSQL, MySQL) and streaming platforms (e.g., Kafka, RabbitMQ) in a high-throughput environment.

🏖️ Benefits

• Culture - We put our people first and prioritize the well-being of every team member. We’ve built a company where all opinions carry weight and where all voices are heard. We value and respect each other and always look out for one another. Above all, we are human. • Learning - We have a learning and development-focused environment with an emphasis on knowledge sharing, training, and regular internal technical talks. • Compensation - You’ll receive an attractive salary, pension, health insurance, annual bonus, plus other benefits.

Apply Now

Similar Jobs

🔥 11 minutes ago

Sleek

51 - 200

🏢 Enterprise

💸 Finance

☁️ SaaS

Senior SRE Engineer architecting and scaling Sleek’s infrastructure and AI capabilities. Collaborating with teams to ensure secure, resilient, and high-performing platforms.

AWS

Azure

Cloud

Flux

Google Cloud Platform

JavaScript

Kubernetes

Node.js

Prometheus

Python

Terraform

🔥 8 hours ago

Mobile DevOps & Release Engineer on the founding team for power-quality analysis software. Responsible for CI/CD, observability, and release lifecycle of mobile applications.

Android

AWS

Azure

Cloud

Dart

Flutter

iOS

Vault

🔥 16 hours ago

Empower

10,000+ employees

💸 Finance

💳 Fintech

👥 B2C

Senior Site Reliability Engineer driving reliability initiatives across critical financial services infrastructure. Mentoring engineers and implementing highly available systems with extensive AWS knowledge.

AWS

Kubernetes

Python

Terraform

Go

🕒 Yesterday

Gruve

201 - 500

🤖 Artificial Intelligence

🔒 Cybersecurity

🏢 Enterprise

Release Manager coordinating DevOps release lifecycles for enterprise applications at Gruve, focused on improving CI/CD. Collaborating with teams across APAC and EMEA for predictable and safe releases.

AWS

Azure

Cloud

Docker

Google Cloud Platform

Jenkins

Kubernetes

🕒 Yesterday

Merative

1001 - 5000

⚕️ Healthcare Insurance

☁️ SaaS

🤖 Artificial Intelligence

Developer for Dev-Ops Engineering at Merative transforming healthcare data into actionable insights. Collaborating across teams to improve production operations and drive Site Reliability principles.

Ansible

Azure

Cloud

Groovy

Java

Jenkins

Linux

MySQL

Node.js

Oracle

Postgres

Python

Selenium

Unix