Site Reliability Engineer

🕒 November 11, 2025

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Aalyria

Aalyria

51 - 200 employees

📡 Telecommunications

🏢 Enterprise

☁️ SaaS

Telecommunications • Enterprise • SaaS

Aalyria is a company dedicated to creating, organizing, and managing the world's most advanced networks to enable ubiquitous connectivity at the speed of discovery. It utilizes atmospheric laser communications technology and a software platform originally developed by Alphabet. Aalyria's platform orchestrates networks across land, sea, air, space, and beyond. Key technological components include Tightbeam, a free space optics technology, and Spacetime, a software platform for network orchestration. Aalyria is backed by significant investors and has engaged in various high-profile projects, including working with NASA and developing 5G/6G networking platforms.

📋 Description

• Help design and build Aalyria's centralized observability platform, integrating and scaling tools for metrics (e.g. Prometheus), logging (e.g. Loki), and distributed tracing (e.g. Tempo/OpenTelemetry). • Define, implement, and manage a robust framework of Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for our core products, ensuring we are launch-ready. • Partner with SWEs to implement observability best practices, develop standard templates and documentation, and configure tooling (e.g., OpenTelemetry libraries). • Automate the deployment, scaling, and management of the entire observability stack using Infrastructure as Code (e.g. Terraform) and GitOps principles (e.g. ArgoCD). • Partner closely with the core infrastructure team to ensure deep visibility into our Kubernetes clusters and underlying GCP and AWS environments. • Develop and lead the company's monitoring, alerting, and incident response strategy, driving a culture of proactive reliability and blameless post-mortems.

🎯 Requirements

• 4+ years of experience in an SRE or platform engineering role, with a focus on observability for large-scale, distributed compute or network systems. • Deep, hands-on expertise building, scaling, and managing observability platforms (e.g., Prometheus, Grafana, Loki/ELK, OpenTelemetry, Tempo/Jaeger, Honeycomb, etc.). • Proven experience using these tools to support performance analysis and debugging of complex distributed systems. • Strong production-level experience with Google Cloud Platform (GCP) and Kubernetes. • Experience using Infrastructure as Code (IaC) and GitOps principles (e.g., ArgoCD). • Proficiency in a systems programming language, with a strong preference for Go and Python for debugging and writing tooling. • Demonstrable experience defining, implementing, and managing SLOs, SLIs, and error budgets for production services for high availability distributed systems.

🏖️ Benefits

• Innovative Environment: Work at a cutting-edge company shaping the future of aerospace communications. • Impactful Work: Directly contribute to critical national security programs and initiatives. • Growth Opportunities: Expand your career with opportunities for professional development and advancement. • Inclusive Culture: Be part of a collaborative, supportive, and inclusive workplace where your contributions matter. • Flexibility: Flexible working arrangements including hybrid remote/in-office schedules. • Competitive salary, comprehensive benefits (401(k), dental, vision, health, life insurance), paid time off, and equity options.

Apply Now

Similar Jobs

🕒 November 9, 2025

AGENTIC

11 - 50

🤖 Artificial Intelligence

🤝 B2B

🏢 Enterprise

Senior DevOps Engineer / Cloud Architect designing multi-account architectures for Apex program. Mastering AWS and full-stack development with a focus on cloud-native solutions.

AWS

Azure

Cloud

Postgres

Python

React

TypeScript

🕒 November 7, 2025

Senior DevOps Engineer leading infrastructure development for Trax Technologies’ logistics solutions. Collaborating with diverse teams to optimize scalability, resilience, and reliability in cloud-based logistics management.

AWS

Cloud

Distributed Systems

DNS

Docker

Kubernetes

TCP/IP

🕒 November 6, 2025

Stormlight Capital

1 - 10

💸 Finance

💳 Fintech

DevOps Engineer at Stormlight Capital optimizing infrastructure for derivatives trading operations. Ensuring systems process market data and execute trades at high performance.

AWS

Cloud

Google Cloud Platform

Grafana

Prometheus

Python

Go

🕒 November 5, 2025

CloudScouts

11 - 50

🤝 B2B

🏢 Enterprise

💸 Finance

AWS DevOps Engineer designing cloud-native applications for SAP S/4HANA processes. Optimizing AWS cost/performance in fully remote work environment.

AWS

Cloud

DynamoDB

Kafka

🕒 November 4, 2025

TaxAct

51 - 200

💸 Finance

💳 Fintech

🛍️ eCommerce

Consultant role at Taxwell helping clients with tax preparation and advocating for their needs while maintaining an inclusive atmosphere.