Site Reliability Engineer

Telecommunications • Enterprise • SaaS

Aalyria is a company dedicated to creating, organizing, and managing the world's most advanced networks to enable ubiquitous connectivity at the speed of discovery. It utilizes atmospheric laser communications technology and a software platform originally developed by Alphabet. Aalyria's platform orchestrates networks across land, sea, air, space, and beyond. Key technological components include Tightbeam, a free space optics technology, and Spacetime, a software platform for network orchestration. Aalyria is backed by significant investors and has engaged in various high-profile projects, including working with NASA and developing 5G/6G networking platforms.

51 - 200 employees

📡 Telecommunications

🏢 Enterprise

☁️ SaaS

Site Reliability Engineer

November 11

🇺🇸 United States – Remote

💵 $115k - $135k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

Distributed Systems

Google Cloud Platform

Grafana

Kubernetes

Prometheus

Python

Terraform

Apply Now

Aalyria

Telecommunications • Enterprise • SaaS

51 - 200 employees

📡 Telecommunications

🏢 Enterprise

☁️ SaaS

📋 Description

• Help design and build Aalyria's centralized observability platform, integrating and scaling tools for metrics (e.g. Prometheus), logging (e.g. Loki), and distributed tracing (e.g. Tempo/OpenTelemetry). • Define, implement, and manage a robust framework of Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for our core products, ensuring we are launch-ready. • Partner with SWEs to implement observability best practices, develop standard templates and documentation, and configure tooling (e.g., OpenTelemetry libraries). • Automate the deployment, scaling, and management of the entire observability stack using Infrastructure as Code (e.g. Terraform) and GitOps principles (e.g. ArgoCD). • Partner closely with the core infrastructure team to ensure deep visibility into our Kubernetes clusters and underlying GCP and AWS environments. • Develop and lead the company's monitoring, alerting, and incident response strategy, driving a culture of proactive reliability and blameless post-mortems.

🎯 Requirements

• 4+ years of experience in an SRE or platform engineering role, with a focus on observability for large-scale, distributed compute or network systems. • Deep, hands-on expertise building, scaling, and managing observability platforms (e.g., Prometheus, Grafana, Loki/ELK, OpenTelemetry, Tempo/Jaeger, Honeycomb, etc.). • Proven experience using these tools to support performance analysis and debugging of complex distributed systems. • Strong production-level experience with Google Cloud Platform (GCP) and Kubernetes. • Experience using Infrastructure as Code (IaC) and GitOps principles (e.g., ArgoCD). • Proficiency in a systems programming language, with a strong preference for Go and Python for debugging and writing tooling. • Demonstrable experience defining, implementing, and managing SLOs, SLIs, and error budgets for production services for high availability distributed systems.

🏖️ Benefits

• Innovative Environment: Work at a cutting-edge company shaping the future of aerospace communications. • Impactful Work: Directly contribute to critical national security programs and initiatives. • Growth Opportunities: Expand your career with opportunities for professional development and advancement. • Inclusive Culture: Be part of a collaborative, supportive, and inclusive workplace where your contributions matter. • Flexibility: Flexible working arrangements including hybrid remote/in-office schedules. • Competitive salary, comprehensive benefits (401(k), dental, vision, health, life insurance), paid time off, and equity options.

Apply Now

Similar Jobs

Senior Release Engineer

November 10

Uniswap Labs

51 - 200

₿ Crypto

💸 Finance

🌐 Web 3

Senior Release Engineer building automation and release processes for web and mobile applications at Uniswap Labs. Creating and implementing CI/CD pipelines and improving release confidence.

🇺🇸 United States – Remote

💵 $209.1k - $232.3k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Android

iOS

Jenkins

Python

TypeScript

DevOps Engineer

November 10

GR8 Tech

501 - 1000

🎮 Gaming

☁️ SaaS

Middle DevOps Engineer at GR8 Tech developing a high-load transaction processing platform. Leveraging AWS and advanced container orchestration technologies for real-time solutions.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🗣️🇺🇦 Ukrainian Required

🗣️🇷🇺 Russian Required

Ansible

AWS

Kubernetes

Linux

Prometheus

Terraform

Senior DevOps Engineer / Cloud Architect

November 9

AGENTIC

11 - 50

🤖 Artificial Intelligence

🤝 B2B

🏢 Enterprise

Senior DevOps Engineer / Cloud Architect designing multi-account architectures for Apex program. Mastering AWS and full-stack development with a focus on cloud-native solutions.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Azure

Cloud

Postgres

Python

React

TypeScript

DevOps Engineer II – Midmarket

November 8

Clariti

51 - 200

🏛️ Government

🏢 Enterprise

DevOps Engineer supporting Mid-Market customers on Clariti Launch by owning the infrastructure and security posture. Designing and operating secure, scalable AWS environments and standardizing CI/CD.

🇺🇸 United States – Remote

💵 $100k - $135k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Kubernetes

Postgres

SDLC

Terraform

Senior DevOps Engineer, Remote

November 7

Trax Technologies

501 - 1000

Senior DevOps Engineer leading infrastructure development for Trax Technologies’ logistics solutions. Collaborating with diverse teams to optimize scalability, resilience, and reliability in cloud-based logistics management.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

Distributed Systems

DNS

Docker

Kubernetes

TCP/IP