Senior Site Reliability Engineer, SRE

Job not on LinkedIn

October 23

Apply Now
Logo of Authlete

Authlete

API • Cybersecurity • SaaS

Authlete is a leading provider of OAuth 2. 0 and OpenID Connect (OIDC) solutions, offering APIs designed to support identity and access management. Their platform helps developers streamline the implementation of OAuth/OIDC standards, allowing companies to focus on product development while Authlete manages the complexities of token issuance and protocol steps. With compliance to international standards, Authlete aims to simplify security and identity assurance in various applications.

11 - 50 employees

Founded 2015

🔌 API

🔒 Cybersecurity

☁️ SaaS

📋 Description

• Design, maintain, and optimize Kubernetes-based deployments across Shared Cloud, Dedicated Cloud, and Self-Managed deployment models • Develop and improve Helm charts as the standard deployment method across all supported environments • Manage and automate GitLab CI/CD pipelines, including container image packaging and release processes • Enhance monitoring, alerting, and observability using Google Cloud Monitoring, Prometheus, and Grafana • Review and improve cloud functions and internal tooling written in Go, Ruby, and Bash • Troubleshoot infrastructure issues and performance bottlenecks • Contribute to product reliability by investigating and resolving issues in our Java-based servers • Participate in on-call rotations to maintain uptime and rapid incident response • Lead post-incident reviews and drive long-term reliability improvements • Collaborate with Engineering and Support teams to diagnose customer issues and optimize service quality

🎯 Requirements

• Strong experience operating Kubernetes in production (preferably on GKE) • Deep understanding of Kubernetes networking, security, Helm charts, and storage management • Proficiency in one or more programming languages such as Go, Java, Bash, or Ruby • Experience managing GitLab CI/CD pipelines and container image workflows • Ability to write PromQL alerting rules and interpret key reliability metrics • Familiarity with Redis, Liquibase, and TLS/mTLS certificate management • Strong analytical skills for diagnosing performance bottlenecks across network, cache, and database layers • Experience with observability, incident management, and performance testing • Clear communication skills in English; Japanese language proficiency is a plus • Comfortable working independently in a distributed team across time zones

🏖️ Benefits

• Competitive compensation • Global collaboration • Meaningful technical challenges • Flexibility and autonomy • Opportunity to shape infrastructure best practices

Apply Now
Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com