AccelByte

Website LinkedIn All Job Openings

AccelByte is a battle-tested and extensible game backend supporting cross-platform accounts, matchmaking, in-game store,

ONLINE GAME FEATURES • GAME PUBLISHING • CLOUD TECHNOLOGY • PUBLISHING PLATFORM • GAME BACKEND

201 - 500

Site Reliability Engineer

April 24

🇮🇩 Indonesia – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🖥 DevOps & Production Engineering

Apply Now

AccelByte

Website LinkedIn All Job Openings

AccelByte is a battle-tested and extensible game backend supporting cross-platform accounts, matchmaking, in-game store,

ONLINE GAME FEATURES • GAME PUBLISHING • CLOUD TECHNOLOGY • PUBLISHING PLATFORM • GAME BACKEND

201 - 500

Description

• As an SRE/Cloud Engineer - Observability, your primary responsibility revolves around enhancing the observability of our infrastructure • You play an important role in strategically optimizing resources and driving initiatives to ensure effective infrastructure management aligned with business objectives • Your focus lies in implementing tools and practices that enable comprehensive monitoring, logging, and tracing of system components and processes • By doing so, you contribute to improving system reliability, troubleshooting efficiency, and overall operational transparency

Requirements

• Bachelor's Degree background or relevant work experience, certification, or courses • At least 3 years of experience specializing in roles such as Site Reliability Engineering (SRE) or similar, with a particular focus on improving observability within distributed systems • Experience in designing and implementing log collection, aggregation, and visualization systems using Fluentd, Fluentbit, prom-tail, Loki & LokiQL, Logstash, OpenSearch, and AWS Athena • Experience in designing and implementing metric collection, aggregation, and visualization solutions using technologies like Prometheus & PromQL, Grafana, cadvisor, metric-server, and Cloudwatch • Practical knowledge of trace collection, aggregation, and visualization methodologies employing tools such as Grafana tempo & TraceQL, tail sampling, and open telemetry • Basic experience in Kubernetes, including using Kubectl, flux, and other tools for debugging and modifying cluster states and understanding containerization technology's limitations and usage within a Kubernetes cluster • Basic experience in containerization technology, particularly Docker and Containers, including its limitations and practical applications within a Kubernetes cluster environment • Basic experience in using Infrastructure-as-Code (IaC) tools (e.g., Terraform, Cloudformation) for provisioning and configuration management, including the ability to apply, modify, or delete modules and create custom Terraform modules • Basic experience in performing cloud system operations on AWS infrastructure, including backups, snapshots, and other administrative tasks • Practical knowledge of defining budgets, forecasting expenses, and building automated tools to identify cost trends and anomalies for cloud infrastructure • Understanding of distributed systems architecture and best practices • Experience in using one or more scripting or programming languages (e.g., Python, Go) for automation and tooling development • Experience in managing and optimizing costs across multiple cloud accounts or subscriptions, with proficiency in cloud account management tools and techniques, is a plus point • Familiarity with multi-cloud or hybrid cloud environments, including the ability to navigate and leverage different cloud platforms simultaneously, is preferred • Experience at a AAA game studio or a software product company is preferred • AWS Certified Solutions Architect is a big plus • Experience working in a multinational technology startup is a big plus • Eagerness to learn new languages and technologies • Proficiency in written and verbal English language • Flexibility to adjust to work routines/schedules, as required, to meet the needs of the company and the expectations of customers