Senior Site Reliability Engineer, Kong Konnect

November 6

Apply Now
Logo of Kong Inc.

Kong Inc.

API • SaaS • Enterprise

Kong Inc. is a company that provides a comprehensive API platform designed to facilitate API management, AI integration, and developer productivity. It offers solutions like Kong Gateway, Kong Konnect, and a variety of other tools targeted at managing and optimizing the API lifecycle. Kong's platform supports multi-cloud environments and is built to deliver high performance and security. It is notably recognized by Gartner as a leader in API management and supports innovations across industries like financial services, healthcare, and technology. The company emphasizes flexibility, security, and speed, making it a favored choice for enterprises looking to enhance their digital services through APIs. Kong also supports a robust community of developers and provides extensive integrations and plugins to streamline API management and operations.

201 - 500 employees

Founded 2017

🔌 API

☁️ SaaS

🏢 Enterprise

💰 $100M Series D on 2021-02

📋 Description

• Operate and scale Kong’s global SaaS platform (Konnect), ensuring reliability, availability, and performance across regions and clouds. • Build, automate, and maintain Kubernetes-based infrastructure and deployment workflows using Terraform/Terragrunt, Helm, and ArgoCD. • Design, maintain, and optimize multi-region data and caching layers — including PostgreSQL, Redis, ClickHouse, and Druid — for high availability and low latency. • Operate and improve Kong Gateway and Kong Mesh environments supporting hybrid and distributed architectures. • Develop and maintain CI/CD pipelines and GitOps workflows to automate service delivery and ensure consistent infrastructure changes. • Enhance observability and incident response readiness through systems like Datadog, Prometheus, Grafana, and Thanos, defining and tracking SLOs. • Collaborate closely with development and security teams to ensure smooth operation of SaaS services in compliance with reliability, security, and regulatory standards. • Participate in a global 24/7 on-call rotation and drive continuous improvement of operational playbooks and postmortem practices. • Lead and contribute to scaling initiatives that improve elasticity, reliability, and cost-efficiency across the SaaS platform.

🎯 Requirements

• BS in Computer Science or equivalent practical experience. • Demonstrated experience running and scaling SaaS platforms in production, ideally across multiple cloud providers. • Deep expertise in Kubernetes, including debugging cluster/networking issues and designing for fault tolerance and scalability. • Strong proficiency with Infrastructure as Code tools like Terraform or Terragrunt. • Experience with CI/CD pipelines and GitOps workflows (ArgoCD, Atlantis, Helm). • Proficiency in one or more programming languages (Go, Python, Bash) for automation and tooling. • Solid understanding of Linux/Unix systems, networking (DNS, TLS/SSL, HTTP), and distributed systems. • Familiarity with streaming systems like Kafka and observability platforms (Datadog, Prometheus, Grafana). • Experience working in a 24/7/365 production support environment.

🏖️ Benefits

• Health insurance • Professional development opportunities

Apply Now

Similar Jobs

November 4

Forward Deployment Engineer at Revic ensuring successful customer implementations and technical onboarding. Collaborating with product and engineering teams to optimize deployment processes.

Python

SQL

November 4

DevOps Specialist managing and maintaining cloud infrastructure at Helm Operations. Collaborating closely with development and QA teams to enhance deployment and automation processes.

AWS

Cloud

Docker

Grafana

Kubernetes

Prometheus

Python

Terraform

November 4

Crossover

5001 - 10000

DevOps Specialist managing AWS infrastructure at Helm Operations, a leading maritime software provider. Focused on automation, resiliency, and system reliability in cloud environments.

AWS

Cloud

Docker

Grafana

Kubernetes

Prometheus

Python

Terraform

October 29

Site Reliability Engineer at Fellow, optimizing infrastructure for AI Meeting Assistant. Collaborating with teams to ensure robust systems and exploring innovative technologies.

AWS

Cloud

EC2

ElasticSearch

Grafana

Jenkins

Kubernetes

Prometheus

October 28

Hopper

201 - 500

Site Reliability Engineer for Hopper's Platform Infrastructure team, enhancing cloud foundation and automating processes. Supporting developers in a remote-first environment with a focus on operational excellence.

Cloud

Distributed Systems

DNS

Google Cloud Platform

Kubernetes

NoSQL

Python

SQL

Terraform

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com