
51 - 200 employees
Founded 2017
🚗 Transport
🤖 Artificial Intelligence
💰 $30M Venture Round on 2023-08
Transport • Artificial Intelligence
Serve Robotics is an innovative company focused on revolutionizing the delivery industry with its autonomous delivery robots. The company aims to make delivery services more affordable, sustainable, and convenient by using self-driving robots instead of traditional two-ton vehicles for small deliveries like burritos. Through a commercial deal with Uber, Serve Robotics plans to deploy up to 2,000 robots, marking a significant advancement in the autonomous delivery sector.
🔥 9 minutes ago
🇲🇾 Malaysia – Remote
💵 RM90k - RM110k / year
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
Improve your chances of getting an interview by checking your resume score before you apply.

51 - 200 employees
Founded 2017
🚗 Transport
🤖 Artificial Intelligence
💰 $30M Venture Round on 2023-08
Transport • Artificial Intelligence
Serve Robotics is an innovative company focused on revolutionizing the delivery industry with its autonomous delivery robots. The company aims to make delivery services more affordable, sustainable, and convenient by using self-driving robots instead of traditional two-ton vehicles for small deliveries like burritos. Through a commercial deal with Uber, Serve Robotics plans to deploy up to 2,000 robots, marking a significant advancement in the autonomous delivery sector.
• Serve as the primary incident lead during your region’s daytime hours, coordinating technical investigations, centralizing communication, and engaging the appropriate engineering and SRE teams when escalation is required. • Respond to escalations from Tier 1 support, using runbooks, metrics, logs, and system diagnostics to investigate and remediate issues or determine when escalation to Tier 3 is necessary. • Develop and update runbooks, workflows, and operational documentation to ensure consistent and reliable responses to recurring issues, collaborating with product teams to expand coverage over time. • Write, maintain, and enhance automation scripts and tools that streamline common remediation steps, improve response times, and reduce manual operational overhead. • Use metrics, logs, and tracing tools (Grafana/Prometheus, GCP Monitoring, OpenTelemetry) to proactively identify problems, validate system behavior, and support continuous improvement of detection mechanisms. • Act as the central point of communication during active incidents, ensuring timely updates and clear routing to the correct product engineering and SRE stakeholders. • Collaborate with reliability and product teams to share insights, recommend improvements, and help refine processes that enhance the stability and operability of our systems. • Participate in a shared weekend on-call rotation to help maintain operational coverage for production systems, responding to incidents and escalations as needed and coordinating with engineering teams when issues arise. • Help establish operational best practices, refine workflows, and prepare the foundation for a broader reliability operations function.
• Bachelor’s degree in Computer Science, Information Technology, Engineering, or equivalent practical experience. • 5+ years of professional experience in Reliability Operations, Site Reliability Engineering, DevOps, IT Operations, or a related technical support function. • Demonstrated experience owning or participating in Tier 2 or Tier 3 technical investigations, including triage, log analysis, and structured escalation. • Experience supporting distributed systems, cloud-hosted services, or production operational environments. • Hands-on experience participating in incident response processes. • Strong proficiency with Linux, including navigating systems, reviewing logs, and performing diagnostics. • Experience writing, executing, and maintaining runbooks, automations, and operational workflows. • Ability to interpret metrics, logs, and traces using tools such as Grafana/Prometheus, Google Cloud Monitoring, and OpenTelemetry. • Familiarity with modern cloud environments, preferably Google Cloud Platform (GCP), including basic debugging, permissions, and service-level triage. • Ability to investigate and remediate issues following documented procedures, escalating effectively when needed. • Understanding of CI/CD pipelines, deployed application behavior, and operational dependencies across microservices. • Proficiency with Jira or similar platforms for ticketing and structured incident tracking. • Exceptional communication skills, especially during high-pressure incidents where clear, concise updates are critical. • Calm and methodical approach to troubleshooting, prioritization, and decision-making. • Strong collaboration skills when coordinating with product engineering, SRE, and global support teams. • High level of ownership, reliability, and accountability when handling operational responsibilities and incident leadership.
Apply Now🕒 2 days ago
51 - 200
Site Reliability Engineer ensuring high availability and performance of production systems at Pave Bank. Collaborating with teams for infrastructure reliability in a fintech environment.
🇲🇾 Malaysia – Remote
🔥 Funding within the last year
💰 $39M Series A - Pave Bank on 2025-10
⏰ Full Time
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
Cloud
Distributed Systems
Docker
Google Cloud Platform
Grafana
Kubernetes
Microservices
Prometheus
Python
Terraform
Go
🕒 5 days ago
Site Reliability Engineer improving and scaling the reliability of the Pod platform, focusing on incident response and operational tooling.
🇲🇾 Malaysia – Remote
💵 $100k / year
⏰ Full Time
🟡 Mid-level
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
Cloud
Distributed Systems
Docker
Grafana
Linux
Prometheus
Python
Rust
🕒 June 11
Cloud Operations Engineer at Unit4 solving customer business processing issues and building better solutions with skills in Azure, DevOps, and troubleshooting.
Azure
Cloud
SMTP
SQL
🕒 April 24
Site Reliability Engineer joining LineTen to ensure global coverage of our products. Responsible for engineering support and development experience using Docker and Kubernetes.
🇲🇾 Malaysia – Remote
💰 Seed Round on 2018-02
⏰ Full Time
🟡 Mid-level
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
Cloud
Docker
Kubernetes
🕒 April 15
Senior DevOps Engineer optimizing infrastructure for SaaS and on-prem AI services at Arize. Collaborates with customers and product teams to enhance performance and reliability.
AWS
Azure
Cloud
Google Cloud Platform
Kubernetes