Senior Machine Learning Site Reliability Engineer

Job not on LinkedIn

🕒 January 13

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Prima

Prima

1001 - 5000 employees

Founded 2015

💸 Finance

👥 B2C

💰 $115.8M Series A on 2018-11

Finance • B2C

Prima is a digitally-native insurance company that redesigns and digitizes the insurance value chain to deliver fast, customer-friendly policies and claims online. Founded in 2015, Prima has grown to serve over 5 million customers across Europe and reported €1. 8 billion in gross written premiums in 2025, operating in Italy, Spain and the UK through partnerships with established carriers and brokers. The company builds its own tech platforms and data stack to power pricing, distribution, agent/broker management, and claims handling, and joined the AXA Group in November 2025 to support further growth.

📋 Description

• Design, build, and operate reliable and scalable systems by defining and monitoring SLOs/SLIs • work directly on production infrastructure • collaborate closely with software engineers on system design and reliability improvements • actively develop automation for infrastructure and operational workflows to eliminate toil and reduce MTTR • participate in and lead incident response • drive blameless post-incident reviews with concrete follow-ups implemented in code and tooling • continuously analyze and optimize system performance and cost • provide data, insights, and recommendations to inform capacity planning • support security best practices through hands-on vulnerability remediation and threat mitigation

🎯 Requirements

• Hands-on experience with SRE practices in production • strong AWS expertise • Kubernetes, networking, DNS, and Infrastructure as Code (Pulumi preferred, Terraform a plus) • strong software engineering fundamentals with emphasis on code quality and maintainability • solid Python proficiency and deep knowledge of the Python ecosystem (testing, debugging, packaging) • hands-on experience with PySpark • Familiarity with MLOps practices such as model registries, model versioning, retraining workflows, and end-to-end deployment lifecycles • stakeholder engagement and mentoring e.g. lead incident response and RCAs • improve system reliability • engage stakeholders to propose solutions, share learnings, and mentor others

🏖️ Benefits

• private healthcare • gym discounts • wellbeing programs • mental health support • learning resources • mentorship • tailored growth plan

Apply Now