SRE – Infra

11 - 50 employees

Founded 2020

☁️ SaaS

⚡ Productivity

🏢 Enterprise

SaaS • Productivity • Enterprise

PostHog is a comprehensive platform that empowers developers to build successful products by providing tools for product analytics, web analytics, session replay, feature flags, experiments, and surveys. It integrates seamlessly into existing workflows, offering data pipelines and warehousing solutions that synchronize with popular platforms like Stripe, Hubspot, Zendesk, and more. With PostHog, teams can safely roll out new features, run experiments with statistical significance, and gather in-depth insights with AI and LLM products. The platform is built with full API access, enabling complete control over customer data. PostHog scales with businesses from startups to growth stages, making it a versatile tool for engineering teams seeking to streamline their data operations while focusing on product development.

SRE – Infra

Job not on LinkedIn

🕒 April 9

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

Kubernetes

Linux

Node.js

Terraform

Apply Now

Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

PostHog

11 - 50 employees

Founded 2020

☁️ SaaS

⚡ Productivity

🏢 Enterprise

SaaS • Productivity • Enterprise

📋 Description

• You won’t be in a typical “keep the lights on” SRE role. The work is about turning a fast-growing, stateful system into a predictable, well-automated platform. • Operating EKS clusters across several environments with Karpenter autoscaling, Cilium networking, and ArgoCD-driven GitOps deployments • Managing and evolving a multi AWS account organization, provisioning, networking, access control, and cross-account connectivity • Maintaining the Terraform/Terragrunt IaC platform - modules, automated plan-on-PR / apply-on-merge pipelines, and safe patterns for shared infrastructure • Improving operational tooling around deploys, schema changes, backups, restores, and incident response • Reducing operational load by identifying repeat pain points and eliminating them through code and self-healing automation • Optimizing cloud spend as you go • Participating in on-call and incident response, with a strong focus on making incidents rarer over time.

🎯 Requirements

• Deep hands-on experience with Kubernetes in production (EKS preferred). You've debugged node pressure, networking issues, and deployment failures at scale (thousands of nodes) • Strong experience operating production infrastructure on AWS. Not just one account, but understanding organizational boundaries, IAM, and networking between many • Experience automating infrastructure using Terraform or Terragrunt at scale, including module design and state management • Solid understanding of Linux systems (disk, memory, networking, failure modes) • Experience supporting stateful systems (databases, queues, storage systems, etc.) • Ability to debug and reason about performance and reliability issues in production • You're comfortable owning systems end-to-end, including on-call responsibilities.

🏖️ Benefits

• Transparency: Everyone can read about our roadmap, how we pay (or even let go of) people, our strategy, and how we work, in our public company handbook. Internally, we share revenue, notes and slides from board meetings, and fundraising plans, so everyone has the context they need to make good decisions. • Autonomy: We don’t tell anyone what to do. Everyone chooses what to work on next based on what's going to have the biggest impact on our customers, and what they find interesting and motivating to work on. • Shipping fast: Why not now? We want to build a lot of products; we can't do that shipping at a normal pace. We prioritize heads down building time over perfect coordination. This will be the most productive job you've ever had. • Time for building: Nothing gets shipped in a meeting. We're a natively remote company. We default to async communication – PRs > Issues > Slack. Tuesdays and Thursdays are meeting-free days. • Ambition: We want to solve big problems. We strongly believe that aiming for the best possible upside, and sometimes missing, is better than never trying. We're optimistic about what's possible and our ability to get there. • Being weird: Doing weird stuff is a competitive advantage. And it's fun.

Apply Now

Similar Jobs

Senior Infrastructure Engineer/SRE

🕒 April 9

Cresta

51 - 200

☁️ SaaS

🤖 Artificial Intelligence

🏢 Enterprise

Senior Infrastructure Engineer/SRE responsible for building core infrastructure at AI-driven contact center company. Designing tools for developers and ensuring reliability across cloud platforms.

🇺🇸 United States – Remote

💵 $205k - $270k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

AWS

Azure

Cloud

DNS

EC2

Flux

Kubernetes

Postgres

Python

Terraform

DevOps Architect / SME, MultiCloud

🕒 April 8

EITACIES Inc.

51 - 200

🏢 Enterprise

🔒 Cybersecurity

🤖 Artificial Intelligence

DevOps Architect leading platform engineering standards across a multi-cloud, hybrid environment at Eitacies Inc. Focus on automation, infrastructure, and cloud architecture.

🇺🇸 United States – Remote

💵 $60 / hour

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

DNS

Docker

DynamoDB

Firewalls

Google Cloud Platform

Kubernetes

Python

SQL

Terraform

Site Reliability Engineering Manager II

🕒 April 8

Flywire

1001 - 5000

💸 Finance

💳 Fintech

Manager II, Site Reliability Engineering at Flywire driving reliability and performance in our cloud infrastructure. Lead SRE teams, collaborate across functions, and ensure production excellence.

🇺🇸 United States – Remote

💵 $160k - $200k / year

💰 $60M Series F on 2021-03

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Cloud

Senior DevSecOps, Platform Engineer – Clearance Required

🕒 April 7

LMI

1001 - 5000

🤖 Artificial Intelligence

⚕️ Healthcare Insurance

🏛️ Government

Senior DevSecOps / Platform Engineer building and supporting AWS infrastructure for Navy logistics. Work closely with developers to ensure stability, security, and speed.

🇺🇸 United States – Remote

💵 $125k - $175k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Ansible

AWS

Cloud

Kubernetes

Terraform

Senior Site Reliability Engineer, C++

🕒 April 7

Akamai Technologies

5001 - 10000

🔒 Cybersecurity

Senior Site Reliability Engineer analyzing production networks at Akamai to improve reliability and scalability. Support multiple products and collaborate across the organization to enhance customer solutions.

🇺🇸 United States – Remote

💵 $121.4k - $218.6k / year

💰 Post-IPO Equity on 2001-07

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Cloud

Linux

Unix