Senior Site Reliability Engineer

🕒 February 4

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Vantage

Vantage

51 - 200 employees

Founded 2013

☁️ SaaS

🤝 B2B

🛍️ eCommerce

💰 Series unknown on 2016-02

SaaS • B2B • eCommerce

Vantage is a unified retail media platform that helps retailers and advertisers simplify and scale retail advertising operations. The platform centralizes campaign management, targeting and optimization, media planning, billing and analytics, and supports on-site, off-site and in-store advertising through integrations with ad servers, demand sources, CMS and billing/CRM systems. Vantage offers a self-serve interface, role-based advertiser access, AI-powered campaign intelligence, and enterprise-grade reliability to orchestrate retail media workflows and provide full spend visibility.

📋 Description

• Collaborate with a diverse team of software engineers, engaging in iterative processes and effective task planning to drive our projects forward. • Take ownership of the availability, scalability, and performance of our services, to proactively identify issues, and implement automation to prevent the recurrence of problems. • Participate in the on-call rotation, responding to incidents and working with the team to restore service and prevent recurrence. • Contribute to automating infrastructure provisioning, configuration, and management using IaC principles with tools like Terragrunt and Ansible. • Help design and enhance monitoring, logging, and alerting systems to improve observability and ensure system health. • Participate in blameless post-mortems, documenting issues, and following up on action items to foster a culture of learning and continuous improvement. • Foster collaboration with other engineering teams, promoting the reuse of existing frameworks and gaining insights into their operation. • Stay current with industry trends, emerging technologies, and best practices in SRE, DevOps, and automation.

🎯 Requirements

• 6+ years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role working with software and infrastructure. • Proficiency with either Python or Bash. • Hands-on experience with Azure or AWS. • Familiarity with CI/CD pipelines and infrastructure as code (IaC) and its tooling such as terraform and ansible. • Demonstrated ability to triage and prioritize effectively when troubleshooting incidents. • History of engaging effectively with cross-functional teams during events such as incident-response and post-mortems. • Track-record of proactively tailoring infrastructure to meet the unique needs of the product it supports.

🏖️ Benefits

• Remote-friendly setup • Home office support • Annual company retreats

Apply Now

Similar Jobs

🕒 January 13

Cohere

11 - 50

🤖 Artificial Intelligence

🏢 Enterprise

☁️ SaaS

Site Reliability Engineer joining Cohere to build and operate high-performance AI platforms for NLP applications. Collaborating with teams to deploy optimized models in production environments.

AWS

Azure

Cloud

Distributed Systems

Google Cloud Platform

Kubernetes

Linux

Go

🕒 December 16, 2025

Veeva Systems

1001 - 5000

☁️ SaaS

⚕️ Healthcare Insurance

💊 Pharmaceuticals

Release Engineering Manager overseeing deployment activities and managing release engineers for Veeva's SaaS products across different environments. Coordinating software releases, ensuring smooth delivery to clients while supporting the life sciences industry.

Ansible

AWS

Cloud

Jenkins

Python

SDLC

🕒 November 11, 2025

Lazer Technologies

51 - 200

🛍️ eCommerce

💳 Fintech

☁️ SaaS

Senior DevOps Engineer for remote-first product studio helping clients with cloud solutions. Delivering robust CI/CD pipelines and secure infrastructure with modern tools.

AWS

Cloud

Docker

Firewalls

Google Cloud Platform

JavaScript

Kubernetes

Node.js

Python

Terraform

Go

🕒 November 6, 2025

Kong Inc.

201 - 500

🔌 API

☁️ SaaS

🏢 Enterprise

Site Reliability Engineer responsible for operating and scaling Kong’s multi-region SaaS platform. Collaborating on infrastructure, automation, and ensuring service reliability across global regions.

Cloud

Distributed Systems

DNS

Grafana

Kafka

Kubernetes

Linux

Postgres

Prometheus

Python

Redis

Terraform

Unix

Go

🕒 October 14, 2025

Cerebras Systems

201 - 500

🤖 Artificial Intelligence

🔧 Hardware

⚕️ Healthcare Insurance

Sr. Deployment Engineer building and operating AI inference clusters for Cerebras Systems. Working with the world's largest AI chip to ensure scalable delivery of AI workloads.

AWS

Docker

Grafana

Kubernetes

Linux

Prometheus

Python