Lead SRE – Observability

🔥 0 minutes ago

🍂 Massachusetts – Remote

info

💵 $143k - $243k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

info
Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of athenahealth

athenahealth

5001 - 10000 employees

Founded 1997

⚕️ Healthcare Insurance

☁️ SaaS

🤖 Artificial Intelligence

💰 Post-IPO Equity on 2017-05

Healthcare Insurance • SaaS • Artificial Intelligence

athenahealth is a provider of healthcare software solutions focused on enhancing clinical effectiveness, patient experience, and financial performance. Its flagship product, athenaOne, is an all-in-one solution that includes electronic health records (EHR), revenue cycle management (RCM), and patient engagement tools. The company offers tailor-made solutions for various healthcare providers, ranging from small practices to large health systems and specialities like behavioral health and pediatrics. athenahealth also provides payer solutions, advisory services, and platform services to improve patient outcomes and reduce costs, while ensuring a highly reliable service with 99. 98% uptime. Their solutions also incorporate AI-powered features like Ambient Notes to streamline clinical documentation. With a focus on interoperability and efficiency, athenahealth aims to simplify healthcare delivery and improve outcomes for both providers and patients.

📋 Description

• Build and operate scalable observability and telemetry platforms that process logs, metrics, traces, and events across production environments • Support monitoring, alerting, and instrumentation strategies that improve service visibility and operational insight • Partner with engineering teams to strengthen telemetry collection and overall observability • Design resilient, automated infrastructure and platform services that improve reliability, scalability, and efficiency • Develop Infrastructure as Code and automation solutions that reduce toil and improve consistency • Lead technical initiatives from architecture through implementation with attention to performance, reliability, security, and maintainability • Troubleshoot complex production issues involving distributed systems, Linux infrastructure, networking, cloud services, and telemetry pipelines • Participate in incident response and on-call processes • Help drive operational excellence, root cause analysis, and continuous improvement • Mentor engineers on SRE best practices, observability strategy, and scalable systems design • Contribute to long-term platform strategy and reliability improvements.

🎯 Requirements

• 7+ years of experience operating and engineering large-scale production infrastructure and distributed systems • Strong expertise in Linux systems engineering, cloud infrastructure, and SRE practices • Proven experience designing and operating observability and telemetry platforms • Hands-on experience with tools such as OpenSearch/Elasticsearch, Kafka, Prometheus, Grafana, Vector, Fluentd, OpenTelemetry, ClickHouse, or similar • Experience building Infrastructure as Code solutions using Terraform, CloudFormation, or equivalent tooling • Strong automation and software engineering skills using Python, Golang, or Bash • Experience troubleshooting large-scale distributed systems in production with a focus on availability, performance, scalability, and resiliency • Experience operating services in cloud-native environments, including AWS and containerized platforms • Strong understanding of monitoring strategy, telemetry pipelines, incident response, root cause analysis, and operational excellence • Ability to communicate effectively across engineering organizations and influence technical decision-making.

🏖️ Benefits

• Health and financial benefits • Tuition assistance • Employee resource groups • Collaborative workspaces • Flexible work-life balance

Apply Now

Similar Jobs

🔥 1 hour ago

TrueML

51 - 200

💳 Fintech

💸 Finance

👥 B2C

Senior DevOps Engineer focusing on cloud architecture and CI/CD at TrueML, enhancing infrastructure scalability and reliability. Engaging in hands-on technical execution and team collaboration.

🔥 1 hour ago

Vouched

11 - 50

📋 Compliance

🔐 Security

🤖 Artificial Intelligence

Senior/Staff DevOps Engineer at Vouched designing, building, and operating cloud infrastructure. Focused on operational excellence and security in identity verification platform.

🔥 2 hours ago

ClickUp

1001 - 5000

☁️ SaaS

⚡ Productivity

🏢 Enterprise

GTM DevOps Engineer at ClickUp responsible for reliability and automation of Go-To-Market technology stack. Collaborating with developers to build CI/CD pipelines and manage cloud infrastructure.

🔥 2 hours ago

DMI (Digital Management, LLC)

1001 - 5000

☁️ SaaS

🏢 Enterprise

Mid-level DevSecOps Engineer supporting hybrid cloud infrastructure for federal agency client. Focus on automation, security, and CI/CD practices.

🔥 6 hours ago

Effectual

201 - 500

🏢 Enterprise

🤖 Artificial Intelligence

Senior DevOps Architect leading DevOps transformation initiatives for enterprise clients by designing and implementing cloud automation solutions. Collaborating with clients and mentoring teams throughout their DevOps adoption journey.