
51 - 200 employees
Founded 2017
🚗 Transport
🤖 Artificial Intelligence
💰 $30M Venture Round on 2023-08
Transport • Artificial Intelligence
Serve Robotics is an innovative company focused on revolutionizing the delivery industry with its autonomous delivery robots. The company aims to make delivery services more affordable, sustainable, and convenient by using self-driving robots instead of traditional two-ton vehicles for small deliveries like burritos. Through a commercial deal with Uber, Serve Robotics plans to deploy up to 2,000 robots, marking a significant advancement in the autonomous delivery sector.
🔥 8 minutes ago
🇲🇾 Malaysia – Remote
💵 RM80k - RM100k / year
⏰ Full Time
🟢 Junior
🟡 Mid-level
⛑ DevOps & Site Reliability Engineer (SRE)
Improve your chances of getting an interview by checking your resume score before you apply.

51 - 200 employees
Founded 2017
🚗 Transport
🤖 Artificial Intelligence
💰 $30M Venture Round on 2023-08
Transport • Artificial Intelligence
Serve Robotics is an innovative company focused on revolutionizing the delivery industry with its autonomous delivery robots. The company aims to make delivery services more affordable, sustainable, and convenient by using self-driving robots instead of traditional two-ton vehicles for small deliveries like burritos. Through a commercial deal with Uber, Serve Robotics plans to deploy up to 2,000 robots, marking a significant advancement in the autonomous delivery sector.
• Lead incident investigations during your region’s daytime hours, providing timely updates, escalating appropriately, and supporting senior engineers leading the response. • Respond to escalations from Tier 1 support using established runbooks, metrics, logs, and diagnostics to remediate issues or escalate to Tier 3 when needed. • Update runbooks and operational documentation based on new issues, discoveries, and feedback, ensuring clarity and consistency across all procedures. • Run existing automations and collaborate with senior team members to enhance tooling and scripts that streamline troubleshooting and remediation tasks • Use observability tools such as Grafana/Prometheus, GCP Monitoring, and OpenTelemetry to interpret metrics, logs, and traces, helping identify anomalies and validate system performance. • Provide concise, accurate updates during incidents, ensuring information reaches the correct engineering and SRE contacts and supporting structured incident coordination. • Participate in discussions around root causes, share operational insights, and contribute to process improvements that enhance system stability and supportability. • Participate in a shared weekend on-call rotation to help maintain operational coverage for production systems, responding to incidents and escalations as needed and coordinating with engineering teams when issues arise. • Proactively strengthen workflows, adopt best practices, and build the foundation of the Reliability Operations function as it evolves.
• Bachelor’s degree in Computer Science, Information Technology, Engineering, or equivalent hands-on experience. • 2–4 years of experience in Reliability Operations, Site Reliability Engineering, DevOps, IT Operations, or a related technical support function. • Experience participating in Tier 1 or Tier 2 investigations, including log review, basic triage, and structured escalation. • Exposure to operational environments supporting distributed or cloud-based systems. • Participation in incident response workflows and/or on-call rotations. • Proficiency with Linux, including navigating systems, reviewing logs, and performing basic diagnostics. • Experience using and contributing to runbooks and operational workflows. • Ability to interpret metrics, logs, and traces using tools such as Grafana/Prometheus, Google Cloud Monitoring, and OpenTelemetry. • Familiarity with cloud platforms, preferably Google Cloud Platform (GCP). • Ability to follow documented remediation steps, with good judgment around when to escalate. • Understanding of CI/CD pipelines and how application deployments affect runtime behavior. • Experience using Jira or similar ticketing systems. • Clear and effective communicator, especially when providing updates during time-sensitive operational issues. • Calm, organized approach to troubleshooting and prioritization. • Collaborative mindset, working effectively with senior operations engineers, product teams, and SREs. • Strong sense of ownership and accountability for operational responsibilities.
• Continuous operational coverage • Weekend on-call rotation shared across the Reliability Operations team
Apply Now🕒 5 days ago
Site Reliability Engineer improving and scaling the reliability of the Pod platform, focusing on incident response and operational tooling.
🇲🇾 Malaysia – Remote
💵 $100k / year
⏰ Full Time
🟡 Mid-level
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
Cloud
Distributed Systems
Docker
Grafana
Linux
Prometheus
Python
Rust
🕒 June 11
Cloud Operations Engineer at Unit4 solving customer business processing issues and building better solutions with skills in Azure, DevOps, and troubleshooting.
Azure
Cloud
SMTP
SQL
🕒 April 24
Site Reliability Engineer joining LineTen to ensure global coverage of our products. Responsible for engineering support and development experience using Docker and Kubernetes.
🇲🇾 Malaysia – Remote
💰 Seed Round on 2018-02
⏰ Full Time
🟡 Mid-level
🟠 Senior
⛑ DevOps & Site Reliability Engineer (SRE)
Cloud
Docker
Kubernetes