Site Reliability Engineer – Insurance Platform

Job not on LinkedIn

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of BJAK

BJAK

51 - 200 employees

🛍️ eCommerce

🏪 Marketplace

eCommerce • Insurance • Marketplace

BJAK is a leading online platform in Southeast Asia that offers comprehensive automobile insurance comparison services. The company enables Malaysian users to compare and purchase auto insurance from multiple insurers efficiently, providing considerable savings and convenience. BJAK is renowned for its user-friendly digital platform that allows quick insurance and road tax renewals, offering discounts up to 11%. With a strong emphasis on customer service, BJAK also provides 24/7 roadside assistance, accident support, and replacement vehicles. It is a pioneer in the insurance comparison sector in the region and has facilitated significant savings for millions of car owners.

📋 Description

• Own reliability and operational stability of BJAK’s production systems. • Design and improve monitoring, alerting, logging and observability across services. • Lead incident response, troubleshooting and structured root cause analysis. • Improve system resilience through redundancy, failover and recovery strategies. • Work with engineers to design systems that are reliable, scalable and operable in production. • Improve deployment safety through CI/CD pipelines, release strategies and automation. • Reduce recurring incidents by identifying root causes and driving long-term fixes. • Manage and optimize cloud infrastructure supporting business-critical workflows. • Strengthen operational practices including on-call processes, incident playbooks and SLAs. • Continuously improve system uptime, performance and operational maturity.

🎯 Requirements

• Experience in Site Reliability Engineering, DevOps, platform engineering or infrastructure roles. • Strong understanding of distributed systems, cloud infrastructure and production operations. • Experience with monitoring, alerting and observability tools. • Strong troubleshooting skills for production incidents and system failures. • Ability to design for reliability, scalability and fault tolerance. • Experience working with CI/CD pipelines and deployment automation. • Strong understanding of system performance, capacity planning and risk management. • Hands-on ownership mindset during incidents and operational issues. • Calm, structured and disciplined approach to production environments. • Strong collaboration with engineering teams in fast-paced environments. • Bonus Points • Experience with AWS, GCP, Azure or similar cloud platforms. • Experience with Kubernetes, Docker or container orchestration systems. • Experience with infrastructure-as-code tools (Terraform, Ansible, etc). • Experience with observability stacks (Prometheus, Grafana, ELK, Datadog, etc). • Experience with incident management tools and on-call systems. • Experience with zero-downtime deployments and progressive delivery strategies. • Experience working in fintech, insurance or regulated industries. • Experience building reliability frameworks or SRE best practices in scaling systems. • Contributions to platform reliability or infrastructure resilience initiatives.

🏖️ Benefits

• Build Reliable Insurance Systems – Support mission-critical automation at scale. • High-Impact Engineering – Solve real-world reliability and distributed systems challenges. • Global Engineering Team – Work with experienced engineers across multiple countries. • Fully Remote – Work remotely from China while collaborating with our Malaysia-based teams. • International Exposure – Build systems used across Southeast Asia markets. • Learning & Development Budget – Support continuous technical growth and certifications. • High Ownership Environment – Strong autonomy over reliability and operational design. • Modern Engineering Culture – Focus on stability, observability and engineering excellence. • Competitive Compensation – Attractive salary package based on experience and impact.

Apply Now

Similar Jobs

🕒 June 18

Siam Makro Public Company Limited

10,000+ employees

🛒 Retail

🤝 B2B

Senior Manager focusing on DevOps and Platform Engineering for China operations. Leading automation, infrastructure, and team development to enhance software delivery processes.

🗣️🇨🇳 Chinese Required

AWS

Cloud

Cyber Security

Distributed Systems

Grafana

Kubernetes

Prometheus

Python

Terraform

Vault

🕒 April 29

Fabric Group

51 - 200

🤝 B2B

🏢 Enterprise

🤖 Artificial Intelligence

Senior Consultant overseeing reliability engineering in Melbourne-based software consultancy. Leading strategic decisions and maintaining operational excellence across diverse projects.

AWS

Cloud

Google Cloud Platform

Grafana

Kubernetes

Python

Terraform

Go

🕒 April 10

Davion Labs

51 - 200

₿ Crypto

🌐 Web 3

DevSecOps Engineer focusing on cloud security and automation in the context of CI/CD pipelines. Managing security architecture, vulnerability assessments, and incident responses.

🗣️🇨🇳 Chinese Required

AWS

Python

Go