Senior Site Reliability Engineer, Kong Konnect

November 6

Apply Now
Logo of Kong Inc.

Kong Inc.

API • SaaS • Enterprise

Kong Inc. is a company that provides a comprehensive API platform designed to facilitate API management, AI integration, and developer productivity. It offers solutions like Kong Gateway, Kong Konnect, and a variety of other tools targeted at managing and optimizing the API lifecycle. Kong's platform supports multi-cloud environments and is built to deliver high performance and security. It is notably recognized by Gartner as a leader in API management and supports innovations across industries like financial services, healthcare, and technology. The company emphasizes flexibility, security, and speed, making it a favored choice for enterprises looking to enhance their digital services through APIs. Kong also supports a robust community of developers and provides extensive integrations and plugins to streamline API management and operations.

201 - 500 employees

Founded 2017

🔌 API

☁️ SaaS

🏢 Enterprise

💰 $100M Series D on 2021-02

📋 Description

• Operate and scale Kong’s global SaaS platform (Konnect), ensuring reliability, availability, and performance across regions and clouds. • Build, automate, and maintain Kubernetes-based infrastructure and deployment workflows using Terraform/Terragrunt, Helm, and ArgoCD. • Design, maintain, and optimize multi-region data and caching layers — including PostgreSQL, Redis, ClickHouse, and Druid — for high availability and low latency. • Operate and improve Kong Gateway and Kong Mesh environments supporting hybrid and distributed architectures. • Develop and maintain CI/CD pipelines and GitOps workflows to automate service delivery and ensure consistent infrastructure changes. • Enhance observability and incident response readiness through systems like Datadog, Prometheus, Grafana, and Thanos, defining and tracking SLOs. • Collaborate closely with development and security teams to ensure smooth operation of SaaS services in compliance with reliability, security, and regulatory standards. • Participate in a global 24/7 on-call rotation and drive continuous improvement of operational playbooks and postmortem practices. • Lead and contribute to scaling initiatives that improve elasticity, reliability, and cost-efficiency across the SaaS platform.

🎯 Requirements

• BS in Computer Science or equivalent practical experience. • Demonstrated experience running and scaling SaaS platforms in production, ideally across multiple cloud providers. • Deep expertise in Kubernetes, including debugging cluster/networking issues and designing for fault tolerance and scalability. • Strong proficiency with Infrastructure as Code tools like Terraform or Terragrunt. • Experience with CI/CD pipelines and GitOps workflows (ArgoCD, Atlantis, Helm). • Proficiency in one or more programming languages (Go, Python, Bash) for automation and tooling. • Solid understanding of Linux/Unix systems, networking (DNS, TLS/SSL, HTTP), and distributed systems. • Familiarity with streaming systems like Kafka and observability platforms (Datadog, Prometheus, Grafana). • Experience working in a 24/7/365 production support environment.

🏖️ Benefits

• Health insurance • Professional development opportunities

Apply Now

Similar Jobs

November 4

Revic

11 - 50

🤖 Artificial Intelligence

☁️ SaaS

🤝 B2B

Forward Deployment Engineer at Revic ensuring successful customer implementations and technical onboarding. Collaborating with product and engineering teams to optimize deployment processes.

November 4

Omegro

501 - 1000

🤝 B2B

🏢 Enterprise

☁️ SaaS

DevOps Specialist managing and maintaining cloud infrastructure at Helm Operations. Collaborating closely with development and QA teams to enhance deployment and automation processes.

November 4

Crossover

5001 - 10000

DevOps Specialist managing AWS infrastructure at Helm Operations, a leading maritime software provider. Focused on automation, resiliency, and system reliability in cloud environments.

October 29

Fellow.app

51 - 200

☁️ SaaS

⚡ Productivity

🤖 Artificial Intelligence

Site Reliability Engineer at Fellow, optimizing infrastructure for AI Meeting Assistant. Collaborating with teams to ensure robust systems and exploring innovative technologies.

October 28

Hopper

201 - 500

Site Reliability Engineer for Hopper's Platform Infrastructure team, enhancing cloud foundation and automating processes. Supporting developers in a remote-first environment with a focus on operational excellence.

🇨🇦 Canada – Remote

💵 $150k - $330k / year

💰 $96M Venture Round on 2022-11

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com