Abarca Health

Website LinkedIn All Job Openings

A different kind of PBM on a mission to make healthcare seamless & personalized for everyone. #PBMAwesome

Pharmacy Benefit Management (PBM) • Health Business Intelligence • Health Information Technology (HIT) • Medicare • Medicaid

501 - 1000

Site Reliability Engineer

March 30

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

🖥 DevOps & Production Engineering

Apply Now

Abarca Health

Website LinkedIn All Job Openings

A different kind of PBM on a mission to make healthcare seamless & personalized for everyone. #PBMAwesome

Pharmacy Benefit Management (PBM) • Health Business Intelligence • Health Information Technology (HIT) • Medicare • Medicaid

501 - 1000

Description

• Our Site Reliability Engineering team leverages software engineering and infrastructure operations to create highly reliable and scalable software systems. The team is responsible for ensuring that Abarca’s infrastructure operates efficiently by assisting with the design, build, and maintenance of software systems that automate and optimize the deployment, monitoring, and performance of Abarca’s systems. By focusing on improving the reliability and availability of software systems through engineering best practices and tools, we manage complex distributed systems to meet our external Service Level Agreements and internal Operating Level Agreements. • As our Site Reliability Engineer, you will be responsible for collaborating on the design, build, and maintenance of reliable and scalable infrastructure and software systems. This will be accomplished by tracking error budgets against service level agreements in order to meet and maintain compliance. You will also be collaborating with our Infrastructure, Software Engineering and Security teams to identify and implement reliability and performance improvements across our systems.

Requirements

• Bachelor’s or Master’s Degree in Information Technology, Computer Science or a related field. (In lieu of a degree equivalent experience may be considered) • 3+ years of experience as a site reliability engineer or within related areas. • Experience managing error budgets as well as service level agreements. • Experience programming with, but not limited to: .Net, C#, JavaScript, PyScript, T-SQL/SQL. • Experience with containerization technologies (e.g. Docker and Kubernetes). • Experience with cloud infrastructure platforms (e.g. AWS, Azure, or GCP). • Experience with monitoring and alerting tools (e.g. DataDog, AppDynamics, Dynatrace, Prometheus, SolarWinds, Grafana, or Nagios) • Participate in on-call rotation to provide 24/7 support for critical systems. Availability to work rotating or irregular shifts, including weekends and certain holidays, per business or operational needs. • Some travel required to Puerto Rico location 15-20%. • Excellent oral and written communication skills. • Experience with automation tools (e.g. Ansible, PowerShell scripting). • Certified SRE Foundation (SREF).

Benefits

• Manage error budgets while ensuring that service level agreements are being met while keeping our stakeholders satisfied and reducing penalties associated with performance issues. • Monitor systems for potential performance and reliability issues, proactively taking measures to prevent their occurrence and minimize service disruption. • Promptly troubleshoot and resolve production issues while also identifying opportunities for improvement in terms of reliability, to ensure timely resolution and mitigate future occurrences. • Collaborate with Software Development, among other teams, continuously improving systems and processes to increase efficiency, minimize downtime, and optimize overall system reliability. • Develop and maintain automation tools to improve system observability, reliability, and performance. • Design and implement disaster recovery plans to ensure business continuity.

Apply Now