Senior Production Engineer

November 14

Apply Now
Logo of Upbound

Upbound

Cloud Computing • SaaS • Enterprise

Upbound is a company that specializes in cloud native infrastructure management aimed at developers and platform teams. By leveraging Crossplane, a cloud native control plane framework, Upbound provides tools and services that simplify the management of cloud infrastructure resources through APIs. These solutions empower organizations to adopt self-service deployment strategies, enhance developer productivity, and ensure secure and compliant cloud operations. Upbound manages Crossplane operations, allowing platform teams to focus on building cloud native platforms rather than managing infrastructure life cycles.

11 - 50 employees

Founded 2017

☁️ SaaS

🏢 Enterprise

📋 Description

• Contribute to the production engineering strategy for Upbound Cloud, ensuring high availability, scalability, and efficiency of all customer-facing systems. • Own reliability metrics — including uptime, latency, and error budgets — and champion service-level objectives (SLOs) across teams. • Design and implement automation for provisioning, observability, and incident response to minimize human intervention and increase operational maturity. • Collaborate with development teams to build reliability into the software lifecycle through proactive architectural reviews, chaos testing, and performance profiling. • Operate and improve multi-tenant Kubernetes-based systems, leveraging Crossplane and other cloud-native tooling. • Drive incident management — leading blameless postmortems, root cause analyses, and systemic remediation efforts. • Mentor engineers in production engineering practices, fostering a culture of ownership, reliability, and continuous improvement. • Contribute to the evolution of our cloud platform through design input, tool selection, and scalable systems thinking.

🎯 Requirements

• 5+ years of experience in software, infrastructure, or site reliability engineering roles. • Strong background in distributed systems, service-oriented architectures, and cloud-native technologies. • Proficiency in Kubernetes, Go, and Infrastructure-as-Code strategies. • Expertise in observability and monitoring preferably Honeycomb and OpenTelemetry. • Experience managing large-scale SaaS systems in production with multi-region and high-availability requirements. • Strong understanding of incident response, capacity planning, and change management. • Excellent communication skills and ability to collaborate across functions. • A plus if you: • Experience with Crossplane, multi-cloud infrastructure, or control-plane architectures. • Prior leadership experience driving reliability initiatives at scale.

🏖️ Benefits

• Competitive salary • Remote work options

Apply Now

Similar Jobs

November 12

Senior Production Engineer focusing on integrating high-performance storage systems for AI workloads at CoreWeave. Collaborating with clients and vendors to optimize storage solutions for demanding applications.

🇺🇸 United States – Remote

💵 $165k - $242k / year

💰 $100M Debt Financing on 2022-12

⏰ Full Time

🟠 Senior

🏭 Production Engineer

Cloud

Grafana

Kubernetes

NFS

Prometheus

Go

October 21

Senior Production Engineer at Naehas, a fast-growing SaaS company in Silicon Valley. Engaging in infrastructure and reliability engineering to support production systems in AWS cloud environments.

AWS

Cloud

Distributed Systems

Docker

DynamoDB

Kubernetes

Linux

MongoDB

Python

Terraform

Go

October 21

Senior Staff Production Engineer at Lightspark driving technical vision and infrastructure architecture for open payment solutions powered by Bitcoin. Mentoring engineers and leading high-impact initiatives.

AWS

Cloud

Distributed Systems

Kubernetes

Python

Rust

Terraform

Go

August 14

Liftoff Mobile

501 - 1000

Senior Software Engineer, Production Engineering at Liftoff builds scalable supply infrastructure; improves tooling and reliability.

AWS

Azure

Cloud

Distributed Systems

Google Cloud Platform

HAProxy

Kafka

Kubernetes

Microservices

NoSQL

Postgres

RabbitMQ

Redis

Spark

SQL

Go

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com