Site Reliability Engineer

🕒 vor 7 Monaten

🇺🇸 Vereinigte Staaten – Remote

💵 $115.000 - $135.000 / Jahr

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

⛑ DevOps- und Site Reliability Engineer (SRE)

🗣️🇺🇸🇬🇧 Englisch erforderlich

Jetzt Bewerben
Ähnliche Remote-Jobs finden

📊 Überprüfen Sie Ihre Lebenslauf-Bewertung für diese Stelle

Verbessern Sie Ihre Chancen auf ein Vorstellungsgespräch, indem Sie Ihre Lebenslauf-Bewertung vor der Bewerbung überprüfen.

Logo of Aalyria

Aalyria

51 - 200 Mitarbeiter

📡 Telekommunikation

🏢 Unternehmen

☁️ SaaS

Telecommunications • Enterprise • SaaS

Aalyria ist ein Raum- und Kommunikationstechnologieunternehmen, das planetare Netzwerke erstellt, organisiert und verwaltet, indem es atmosphärische kohärente Freiraum-Laserkommunikation (Tightbeam) mit einer KI-gesteuerten Netzwerkorchestrierungssoftware-Plattform (Spacetime) kombiniert. Das Unternehmen ermöglicht Multi-Domain-, Multi-Orbit-Konnektivität über Land, See, Luft und Weltraum – unterstützt Satellitenkonstellationen, 5G/NTN-Architekturen und hybride Netzwerke – und arbeitet mit kommerziellen und staatlichen Partnern zusammen, um Hard- und Software für widerstandsfähige, hochkapazitative Kommunikation bereitzustellen.

Beschreibung

• Help design and build Aalyria's centralized observability platform, integrating and scaling tools for metrics (e.g. Prometheus), logging (e.g. Loki), and distributed tracing (e.g. Tempo/OpenTelemetry). • Define, implement, and manage a robust framework of Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for our core products, ensuring we are launch-ready. • Partner with SWEs to implement observability best practices, develop standard templates and documentation, and configure tooling (e.g., OpenTelemetry libraries). • Automate the deployment, scaling, and management of the entire observability stack using Infrastructure as Code (e.g. Terraform) and GitOps principles (e.g. ArgoCD). • Partner closely with the core infrastructure team to ensure deep visibility into our Kubernetes clusters and underlying GCP and AWS environments. • Develop and lead the company's monitoring, alerting, and incident response strategy, driving a culture of proactive reliability and blameless post-mortems.

🎯 Anforderungen

• 4+ years of experience in an SRE or platform engineering role, with a focus on observability for large-scale, distributed compute or network systems. • Deep, hands-on expertise building, scaling, and managing observability platforms (e.g., Prometheus, Grafana, Loki/ELK, OpenTelemetry, Tempo/Jaeger, Honeycomb, etc.). • Proven experience using these tools to support performance analysis and debugging of complex distributed systems. • Strong production-level experience with Google Cloud Platform (GCP) and Kubernetes. • Experience using Infrastructure as Code (IaC) and GitOps principles (e.g., ArgoCD). • Proficiency in a systems programming language, with a strong preference for Go and Python for debugging and writing tooling. • Demonstrable experience defining, implementing, and managing SLOs, SLIs, and error budgets for production services for high availability distributed systems.

🏖️ Vorteile

• Innovative Environment: Work at a cutting-edge company shaping the future of aerospace communications. • Impactful Work: Directly contribute to critical national security programs and initiatives. • Growth Opportunities: Expand your career with opportunities for professional development and advancement. • Inclusive Culture: Be part of a collaborative, supportive, and inclusive workplace where your contributions matter. • Flexibility: Flexible working arrangements including hybrid remote/in-office schedules. • Competitive salary, comprehensive benefits (401(k), dental, vision, health, life insurance), paid time off, and equity options.

Jetzt Bewerben

Ähnliche Jobs

🕒 vor 7 Monaten

AGENTIC

11 - 50

🤖 Künstliche Intelligenz

🤝 B2B

🏢 Unternehmen

Senior DevOps Engineer / Cloud Architect designing multi-account architectures for Apex program. Mastering AWS and full-stack development with a focus on cloud-native solutions.

🇺🇸 Vereinigte Staaten – Remote

⏰ Vollzeit

🟠 Senior

⛑ DevOps- und Site Reliability Engineer (SRE)

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 7 Monaten

Stormlight Capital

1 - 10

💸 Finanzen

💳 Fintech

DevOps Engineer at Stormlight Capital optimizing infrastructure for derivatives trading operations. Ensuring systems process market data and execute trades at high performance.

🇺🇸 Vereinigte Staaten – Remote

💵 $225.000 - $325.000 / Jahr

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

⛑ DevOps- und Site Reliability Engineer (SRE)

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 7 Monaten

CloudScouts

11 - 50

🤝 B2B

🏢 Unternehmen

💸 Finanzen

AWS DevOps Engineer designing cloud-native applications for SAP S/4HANA processes. Optimizing AWS cost/performance in fully remote work environment.

🇺🇸 Vereinigte Staaten – Remote

⏰ Vollzeit

🟠 Senior

🔴 Experte

⛑ DevOps- und Site Reliability Engineer (SRE)

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 7 Monaten

TaxAct

51 - 200

💸 Finanzen

💳 Fintech

🛍️ eCommerce

Consultant role at Taxwell helping clients with tax preparation and advocating for their needs while maintaining an inclusive atmosphere.

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 7 Monaten

Hydra Host

11 - 50

🔧 Hardware

🏢 Unternehmen

🤖 Künstliche Intelligenz

Site Reliability Engineer ensuring high uptime and performance for cloud systems at Hydra Host. Collaborating with teams to integrate monitoring and QA tools for reliability and observability.

🇺🇸 Vereinigte Staaten – Remote

💵 $140.000 - $200.000 / Jahr

💰 €10.000.000 Seed Round im 2022-04

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

⛑ DevOps- und Site Reliability Engineer (SRE)

🗣️🇺🇸🇬🇧 Englisch erforderlich