AI DevOps, Reliability Engineer

🕒 2 days ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Branch

Branch

501 - 1000 employees

Founded 2014

🔌 API

🤝 B2B

☁️ SaaS

💰 $282M Series F on 2022-02

API • B2B • SaaS

Branch is a mobile growth company that provides a comprehensive growth platform designed to maximize the value of digital strategies. Their services focus on improving customer engagement, optimizing advertising performance through sophisticated attribution, and ensuring compliance with data protection regulations. Serving over 100,000 companies from startups to Fortune 500 brands, Branch helps businesses create seamless user experiences across various channels, drive conversions, and achieve significant growth in mobile apps and engagement metrics.

📋 Description

• Design and expand deployment automation • Establish release practices and standards • Extend automation deeper into production paths • Enable verification through automation • Own CI/CD standards across teams • Build pipeline tooling for safe engineering paths • Design and build out environments that mirror production • Bring AI tooling into operations • Champion Infrastructure as Code for provisioning • Operate and tune high-volume data infrastructure • Embed with an assigned engineering team day-to-day • Stand up DORA metrics and use them for improvements

🎯 Requirements

• Hands-on experience adopting AI into DevOps and SRE practices (Claude Code, Cursor, agents, or similar) • 7+ years in DevOps, platform, infrastructure, or related engineering roles • Strong hands-on Kubernetes and AWS experience • Deep IaC experience (Terraform and/or CloudFormation) • Proven CI/CD architecture experience: pipelines, quality gates, release automation • GitOps experience with Argo CD (or Flux) for Kubernetes delivery • Hands-on experience operating streaming infrastructure (Kafka) in production • Experience managing SQL and NoSQL datastores at high volume • Solid scripting/automation skills (Python, Bash, or similar) • Working knowledge of observability stacks: Prometheus, Grafana, PagerDuty • Familiarity with on-call, incident response, SLI/SLO definition, and runbooks

🏖️ Benefits

• Comprehensive benefits package • Health and wellness programs • Paid time off • Retirement planning options • 10% annual bonus tied to company goals

Apply Now

Similar Jobs

🕒 3 days ago

Rival Technologies

51 - 200

🤖 Artificial Intelligence

☁️ SaaS

DevOps Engineer leading a team to design and implement scalable infrastructure for a tech company. Collaborating with developers and QA to ensure efficient product releases.

AWS

Cloud

Docker

JavaScript

Jenkins

Kubernetes

Node.js

Python

Ray

Terraform

TypeScript

Go

🕒 June 23

Yelp

1001 - 5000

Site Reliability Engineer managing scalable and self-healing distributed systems at Yelp. Collaborative role ensuring system reliability and performance while using automation and modern tools.

Ansible

AWS

Chef

Cloud

Distributed Systems

DNS

Docker

Google Cloud Platform

Grafana

Java

Jenkins

Kubernetes

Linux

Open Source

Prometheus

Puppet

Python

Ruby

Rust

Splunk

TCP/IP

Terraform

TypeScript

Go

🕒 June 20

Netomi

51 - 200

🤖 Artificial Intelligence

🏢 Enterprise

☁️ SaaS

Agentic AI Forward Deployment Engineering Lead at Netomi transforming enterprise customer requirements into production-grade AI solutions. Collaborating with teams to ensure successful deployments and measurable business outcomes.

Distributed Systems

🕒 June 19

Vista

5001 - 10000

🤝 B2B

🛍️ eCommerce

Site Reliability Engineer enhancing incident response and engineering practices for Vista's reliability. Focused on identifying failure patterns and implementing proactive improvements for operational excellence.

AWS

Azure

Cloud

Grafana

Java

Python

TypeScript

Go

🕒 June 18

Pragmatike

11 - 50

🎯 Recruiter

👥 HR Tech

🤝 B2B

SRE / Network Engineer focused on Metal-as-a-Service and bare-metal automation for innovative cloud infrastructure. Supporting core infrastructure systems and scalable networks in a remote environment.

Ansible

Grafana

Linux

OpenStack

Prometheus

Python

VMware