Staff Site Reliability Engineer, Streaming

November 21

Apply Now
Logo of Alpaca

Alpaca

API • Fintech • Crypto

Alpaca is a fintech company that provides a comprehensive brokerage and trading platform through a suite of APIs. These APIs enable developers and businesses to integrate algorithmic trading, app development, and embedded investing into their services. Alpaca offers services like trading in US stocks, ETFs, and cryptocurrency with options for local currency transactions. The company is recognized for its cyber security practices and is a member of FINRA and SIPC. Alpaca is ideal for fintech startups, broker-dealers, hedge funds, and other financial services looking to build sophisticated trading applications and platforms with minimal friction through their well-documented Broker API.

📋 Description

• Triage difficult technical problems and implement solutions • Enhance our RabbitMQ and Redpanda observability stack by defining Service Level Objectives (SLOs) and alerts, as well as implementing profiling and logging. • Improving our RabbitMQ and Redpanda clients' reliability. • Incident Management: Respond to and resolve incidents in a timely manner, conducting post-incident reviews to identify and implement improvements. • Collaboration: Work closely with development teams to ensure new features and services are designed with reliability and scalability in mind. • Capacity Planning: Monitor system capacity and performance, making recommendations and implementing changes to handle future growth.

🎯 Requirements

• 5+ years of experience in Site Reliability Engineering, Performance Engineering, or similar roles. • 5+ years of experience with message brokers similar to Kafka, RabbitMQ, and Redpanda. • Proven track record of managing and maintaining large-scale, high-availability, and high-performance distributed systems. • Experience designing and implementing SLIs, SLOs, and SLAs for internal and third-party systems with comprehensive alerting and monitoring. • Strong ability to work independently, lead and deliver on large tasks, and collaborate with other members of the organization or external partners. • Significant production experience with Kubernetes. • Proficient with Go. • Proficient with Prometheus. • Proficient with Linux. • Experience with troubleshooting message broker performance issues.

🏖️ Benefits

• Competitive Salary & Stock Options • Health Benefits • New Hire Home-Office Setup: One-time USD $500 • Monthly Stipend: USD $150 per month via a Brex Card

Apply Now

Similar Jobs

November 20

Global Head of Site Reliability Engineering at Socure, leading end-to-end reliability for identity verification platform. Focused on high-impact systems and advanced engineering practices.

AWS

Cloud

November 19

Staff Site Reliability Engineer at Stord responsible for infrastructure management and production system reliability. Focusing on GCP, automation, and mentoring within a dynamic team.

Ansible

Chef

Cloud

Distributed Systems

Docker

Google Cloud Platform

Grafana

Java

Jenkins

Kubernetes

Prometheus

Puppet

Python

Terraform

Go

November 18

Staff Cloud DevOps Engineer for Cleerly, leading cloud infrastructure and enhancing systems for AI-powered diagnostics. Focused on continuous integration, software delivery, and mentoring junior engineers.

AWS

Cloud

DynamoDB

EC2

JavaScript

Kubernetes

Linux

Node.js

Python

Terraform

November 14

Staff Software Engineer overseeing operational support of SAP BTP CPI applications at NBCUniversal. Leading offshore teams and collaborating on production deployments.

November 13

Staff Site Reliability Engineer at Paxos enhancing cloud infrastructure reliability and scalability. Leading initiatives in Kubernetes, IaC, and cloud services architecture.

AWS

Cloud

EC2

Kubernetes

Postgres

Python

Terraform

Go

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com