Staff Site Reliability Engineer

🔥 21 hours ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of AlphaSense

AlphaSense

1001 - 5000 employees

Founded 2011

🤖 Artificial Intelligence

💸 Finance

🏢 Enterprise

💰 Debt Financing on 2022-06

Artificial Intelligence • Finance • Enterprise

AlphaSense is a market intelligence and search platform that empowers companies to unlock critical insights across an extensive universe of public and private content, including company filings, broker research, expert calls, and market trends. Its AI-driven technology affords users the ability to conduct comprehensive due diligence with speed and accuracy, reducing uncertainty and blind spots in decision-making. Trusted by major corporations, financial institutions, and asset management firms globally, AlphaSense serves sectors such as financial services, health care, and technology by providing generative AI-powered solutions that integrate internal proprietary data with external premium content.

📋 Description

• Architect Reliability Paved Paths: Build frameworks and self-service tooling that let teams own the reliability of their services in a “You Build It, You Run It” culture. • Lead AI-Driven Reliability: Drive our AIOps strategy — automating diagnostics, remediation, and proactive failure prevention. • Champion Reliability Culture: Embed SRE practices across engineering via design reviews, production readiness, and operational standards. • Incident Leadership: Act as Incident Commander during critical events, modeling operational excellence, and ensuring blameless postmortems lead to lasting improvements. • Advance Observability: Deliver end-to-end monitoring, tracing, and profiling (Prometheus, Grafana, OTEL, Continuous Profiling) to optimize performance proactively. • Mentor & Multiply: Elevate engineers across SRE and product teams through mentorship, technical guidance, and knowledge sharing.

🎯 Requirements

• 8+ years of experience in Site Reliability Engineering, DevOps, or a similar role, with at least 3+ of those years operating in a Senior+ SRE position • Strong background in running production SaaS systems at scale. • Proficiency in at least one programming/scripting language (Python, Go, or similar). • Hands-on expertise with cloud platforms (AWS, GCP, or Azure) and Kubernetes. • Deep understanding of networking fundamentals (TCP/IP, DNS, HTTP/S, load balancing). • Experience with monitoring & alerting (Prometheus, Grafana, Datadog, ELK). • Familiarity with advanced observability (OTEL, continuous profiling). • Proven incident management experience, including leading high-severity incidents and postmortems. • Strong troubleshooting skills across the full stack. • Excellent communication and collaboration skills.

🏖️ Benefits

• You may also be offered equity • A generous benefits program

Apply Now

Similar Jobs

🔥 23 hours ago

Lyric - Clarity in motion.

201 - 500

⚕️ Healthcare Insurance

💳 Fintech

☁️ SaaS

Azure DevOps Engineer at Lyric managing Azure infrastructure for healthcare technology solutions. Focus on security, reliability, and operational efficiency in a remote role.

AWS

Azure

Cloud

Google Cloud Platform

Python

Terraform

Go

🕒 Yesterday

Accela

201 - 500

🏛️ Government

☁️ SaaS

🏢 Enterprise

Principal Customer Reliability Engineer at Accela serving as a technical liaison for SaaS operations. Facilitating customer engagements and ensuring reliable outcomes in complex environments.

Azure

Cloud

Kubernetes

Python

Terraform

🕒 2 days ago

Kong Inc.

201 - 500

🔌 API

☁️ SaaS

🏢 Enterprise

Staff Site Reliability Engineer for Kong's Volcano platform overseeing reliability and infrastructure scaling. Collaborating on SRE practices and emerging technology evaluations.

🇺🇸 United States – Remote

💵 $150k - $210k / year

💰 $100M Series D on 2021-02

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

Grafana

Kubernetes

Postgres

Prometheus

Redis

Terraform

🕒 4 days ago

Gorilla Logic

501 - 1000

☁️ SaaS

🏢 Enterprise

🤖 Artificial Intelligence

Technical Engineering Manager leading high-performing cloud and DevOps teams. Guiding architecture and delivery of scalable, reliable, and secure cloud solutions for clients.

AWS

Azure

Cloud

Distributed Systems

Google Cloud Platform

Microservices

🕒 5 days ago

General Dynamics Information Technology

10,000+ employees

🔒 Cybersecurity

🤖 Artificial Intelligence

DevSecOps Engineer developing and operating security automation platforms for Department of Defense and Federal customers. Focus on hands-on software development within a DevSecOps context.

Ansible

Docker

Kubernetes

Linux

Terraform