DevOps Engineer – Platform Reliability

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of BJAK

BJAK

51 - 200 employees

🛍️ eCommerce

🏪 Marketplace

eCommerce • Insurance • Marketplace

BJAK is a leading online platform in Southeast Asia that offers comprehensive automobile insurance comparison services. The company enables Malaysian users to compare and purchase auto insurance from multiple insurers efficiently, providing considerable savings and convenience. BJAK is renowned for its user-friendly digital platform that allows quick insurance and road tax renewals, offering discounts up to 11%. With a strong emphasis on customer service, BJAK also provides 24/7 roadside assistance, accident support, and replacement vehicles. It is a pioneer in the insurance comparison sector in the region and has facilitated significant savings for millions of car owners.

📋 Description

• Own and improve platform reliability across production systems and environments. • Manage cloud infrastructure, deployment pipelines and runtime environments. • Design and improve CI/CD workflows to enable safe, fast and repeatable releases. • Build and enhance monitoring, alerting, logging and system observability. • Lead incident response efforts and perform structured root cause analysis. • Improve system resilience through redundancy, failover and recovery mechanisms. • Work with engineering teams to reduce production risk through better deployment and system design practices. • Strengthen infrastructure security, access control and secrets management. • Support reliability for business-critical workflows across multiple countries and services. • Continuously improve operational discipline, uptime and system stability.

🎯 Requirements

• Experience in DevOps, SRE, platform engineering or infrastructure-focused roles. • Strong understanding of cloud infrastructure, CI/CD pipelines and deployment systems. • Experience with production monitoring, alerting and incident management practices. • Ability to troubleshoot infrastructure and production issues in a structured and calm manner. • Strong understanding of reliability engineering principles (availability, fault tolerance, recovery). • Experience supporting business-critical or high-availability systems. • Strong ownership mindset during incidents and operational failures. • Practical judgment on reliability, performance, security and cost trade-offs. • Comfortable working closely with engineering teams in fast-paced environments. • Low ego, disciplined and focused on long-term system stability. • Bonus Points: Experience with AWS, GCP, Azure or similar cloud platforms. • Experience with Kubernetes, Docker or container orchestration. • Experience with infrastructure-as-code tools (Terraform, Ansible, Pulumi, etc.). • Experience with observability stacks (Prometheus, Grafana, ELK, Datadog, etc.). • Experience with zero-downtime deployments, blue-green or canary release strategies. • Experience supporting distributed or high-traffic production systems. • Strong knowledge of security best practices in cloud infrastructure. • Experience in fintech, insurance or regulated industry environments. • Contributions to platform reliability or infrastructure scaling initiatives.

🏖️ Benefits

• Build Reliable AI Platform Infrastructure – Support systems powering end-to-end insurance automation. • High-Impact Engineering – Solve real-world reliability and scaling challenges. • Global Engineering Team – Work with experienced engineers across multiple countries. • Fully Remote – Work remotely from China while collaborating with our Malaysia-based teams. • International Exposure – Build systems used across Southeast Asia markets. • Learning & Development Budget – Support continuous technical growth and certifications. • High Ownership Environment – Strong autonomy over infrastructure and reliability strategy. • Modern Engineering Culture – Focus on stability, observability and engineering excellence. • Competitive Compensation – Attractive salary package based on experience and impact.

Apply Now

Similar Jobs

🕒 June 18

Siam Makro Public Company Limited

10,000+ employees

🛒 Retail

🤝 B2B

Senior Manager focusing on DevOps and Platform Engineering for China operations. Leading automation, infrastructure, and team development to enhance software delivery processes.

🗣️🇨🇳 Chinese Required

AWS

Cloud

Cyber Security

Distributed Systems

Grafana

Kubernetes

Prometheus

Python

Terraform

Vault

🕒 April 29

Fabric Group

51 - 200

🤝 B2B

🏢 Enterprise

🤖 Artificial Intelligence

Senior Consultant overseeing reliability engineering in Melbourne-based software consultancy. Leading strategic decisions and maintaining operational excellence across diverse projects.

AWS

Cloud

Google Cloud Platform

Grafana

Kubernetes

Python

Terraform

Go

🕒 April 10

Davion Labs

51 - 200

₿ Crypto

🌐 Web 3

DevSecOps Engineer focusing on cloud security and automation in the context of CI/CD pipelines. Managing security architecture, vulnerability assessments, and incident responses.

🗣️🇨🇳 Chinese Required

AWS

Python

Go