Lead Site Reliability Engineer

Job not on LinkedIn

đŸ”„ 1 minute ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Accela

Accela

201 - 500 employees

Founded 2000

đŸ›ïž Government

☁ SaaS

🏱 Enterprise

Government ‱ SaaS ‱ Enterprise

Accela is a leader in providing cloud-based solutions designed to modernize government services. Their unified suite of innovative applications focuses on building more connected communities through enhanced efficiency and secure data management. Accela empowers local and state governments by streamlining processes such as building permits, licensing, cannabis regulation, and environmental health. With a focus on civic solutions, Accela aims to improve public sector operations by eliminating data silos and facilitating better interactions with residents. By leveraging their SaaS platform, governments can enhance service delivery, increase transparency, and reduce operational costs, leading to significant time and cost savings.

📋 Description

‱ Serve as a technical leader for reliability engineering, operational excellence, and platform modernization across the Civic Platform. ‱ Drive platform modernization initiatives, including the continued evolution from VM-based architectures toward containerized and cloud-native services, in partnership with DevOps Engineering, Database Engineering, Security, and Development teams. ‱ Lead efforts that improve and sustain the availability, performance, scalability, security, and cost efficiency of Accela's SaaS offerings. ‱ Define, implement, and operate service level objectives (SLOs), service level agreements (SLAs), and error budgets for critical platform services, using data to drive prioritization and risk-based decision making. ‱ Lead observability initiatives across metrics, distributed tracing, logging, and monitoring platforms to improve system visibility and accelerate issue detection and resolution. ‱ Drive Root Cause Analysis (RCA) efforts for complex production incidents, facilitate blameless postmortems, and ensure corrective actions are implemented and tracked to completion. ‱ Design, develop, and maintain automation, tooling, and software solutions that improve reliability, operational efficiency, scalability, and developer productivity. ‱ Serve as a senior technical escalation point during production incidents and for platform changes that impact availability, performance, security, or compliance. ‱ Partner with Security and Compliance teams to ensure platform operations meet regulatory and compliance requirements, including SOC 2, HIPAA, FedRAMP, StateRAMP, and PCI-DSS. ‱ Translate operational metrics, reliability trends, and platform health data into actionable insights for engineering leadership and executive stakeholders. ‱ Mentor engineers across the Cloud Engineering organization and influence engineering best practices through technical leadership and collaboration

🎯 Requirements

‱ 8+ years of experience in Site Reliability Engineering, Software Engineering, Cloud Infrastructure, or related disciplines within a SaaS environment, including experience leading complex technical initiatives. ‱ Demonstrated technical leadership driving platform modernization in containerized and orchestrated environments, including Kubernetes or equivalent technologies. ‱ Hands-on experience operating and supporting large-scale SaaS platforms on Microsoft Azure. ‱ Experience developing automation and operational tooling using Python, PowerShell, Bash, or similar scripting languages. ‱ Deep expertise designing, operating, analyzing, and troubleshooting complex distributed systems across the application, infrastructure, networking, and operating system layers. ‱ Strong experience with modern observability platforms, including monitoring, logging, metrics, and distributed tracing. ‱ Demonstrated success leading incident response, Root Cause Analysis, and continuous improvement initiatives. ‱ Experience establishing and maturing Incident, Problem, and Change Management practices. ‱ Strong written and verbal communication skills with the ability to effectively communicate technical concepts to engineering leadership and executive stakeholders. ‱ Experience using Git and GitHub-based development workflows.

đŸ–ïž Benefits

‱ flexible time off ‱ comprehensive medical, dental, and vision plans ‱ family planning benefits ‱ 401(k) retirement savings plan with company match ‱ health savings account with company contributions ‱ flexible spending account ‱ life, accident, and disability coverage ‱ business travel insurance ‱ employee assistance programs ‱ other well-being benefits

Apply Now

Similar Jobs

đŸ”„ 1 hour ago

Bellese Technologies

51 - 200

⚕ Healthcare Insurance

Engineer II in DevOps at Bellese Technologies, enhancing healthcare systems through innovative software solutions. Collaborate with teams to manage cloud infrastructure and CI/CD processes.

AWS

Cloud

DNS

Docker

Firewalls

Jenkins

Microservices

Terraform

đŸ”„ 6 hours ago

Clinician Nexus

51 - 200

⚕ Healthcare Insurance

📚 Education

☁ SaaS

Senior Manager, DevOps enabling health organizations with technology while leading a team of engineers. Overseeing CI/CD, infrastructure automation, and collaborating with multiple stakeholders.

AWS

Cloud

DNS

Flux

Grafana

Kubernetes

Prometheus

Python

Terraform

Vault

đŸ”„ 6 hours ago

Cognitive Medical Systems, Inc.

51 - 200

☁ SaaS

DevOps Engineer supporting the VA CCN contract for TriWest Healthcare Alliance. Ensuring reliability, stability, and continuous improvement of enterprise applications in healthcare delivery.

JavaScript

MS SQL Server

SQL

đŸ”„ 11 hours ago

ClassWallet

11 - 50

💳 Fintech

📚 Education

đŸ›ïž Government

DevOps Engineer optimizing cloud infrastructure and deployment pipelines for fintech company. Redefining public funds management and ensuring system reliability with high compliance standards.

AWS

Cloud

Docker

EC2

Grafana

Kubernetes

Node.js

Prometheus

Terraform

đŸ”„ 11 hours ago

CACI International Inc

10,000+ employees

🔒 Cybersecurity

Senior DevSecOps Cyber Engineer supporting DOD/AF customer with cloud native solutions and Zero Trust objectives. Collaborating in agile development to enhance enterprise cyber systems.

Cloud

Cyber Security

Java

Kubernetes

Python

Realm