VP, Site Reliability Engineer

Job not on LinkedIn

November 6

Apply Now
Logo of Galaxy

Galaxy

Crypto • Finance • Blockchain

Galaxy is a digital asset and blockchain leader helping institutions, startups, and qualified individuals shape a changing economy through innovative crypto solutions. Galaxy provides a wide array of services, including asset management, trading, lending, custodial technology, and blockchain infrastructure solutions. With a focus on both traditional finance integration and digital asset expertise, Galaxy is committed to advancing the adoption and functionality of cryptocurrencies and blockchain technologies across the globe.

201 - 500 employees

Founded 2018

₿ Crypto

💸 Finance

📋 Description

• Architect, deploy, and maintain robust, scalable, secure AWS-based infrastructure. • Drive adoption and optimization of EKS and Kubernetes for containerized workloads. • Support migration initiatives, moving workloads from legacy VMs to containers in AWS. • Implement and fine-tune SLOs, SLAs, and error budgets to balance innovation and stability. • Collaborate on best practices with Security and Engineering teams for workload reliability. • Build Infrastructure as Code (IaC) with Terraform; maintain compliant, repeatable environments. • Enhance CI/CD pipelines for efficient, secure, and reliable cloud delivery. • Develop and refine automated solutions for autoscaling, failover, and disaster recovery. • Design and implement metrics, logging, and tracing tools (Datadog, OpenTelemetry). • Set up robust monitoring and alerting to proactively detect and address failures. • Lead incident analysis and post-mortems; drive improvements in operational playbooks. • Serve as a subject matter expert for AWS, EKS, and cloud-native tooling within the SRE team. • Optimize AWS resources, cost management, and resiliency best practices. • Ensure secure key management and regulatory compliance for decentralized workloads.

🎯 Requirements

• 8+ years in SRE, DevOps, or Infrastructure Engineering (IC capacity preferred). • Deep hands-on expertise in AWS, Kubernetes/EKS, and containerization. • Extensive IaC experience (Terraform) and cloud-native automation. • Proven track record migrating VM-based workloads to containers in AWS at scale. • Strong experience with observability stacks (Datadog, Prometheus, Grafana, OpenTelemetry). • Excellent analytical, problem-solving, and incident management abilities. • Clear communicator who thrives in team environments, collaborating cross-functionally.

🏖️ Benefits

• Galaxy respects diversity and seeks to provide equal employment opportunities to all employees and job applicants for employment. • We will endeavor to make a reasonable accommodation to the known limitations of a qualified applicant with a disability.

Apply Now

Similar Jobs

November 5

CloudScouts

11 - 50

🤝 B2B

🏢 Enterprise

💸 Finance

AWS DevOps Engineer designing cloud-native applications for SAP S/4HANA processes. Optimizing AWS cost/performance in fully remote work environment.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

November 5

Second Front Systems

51 - 200

☁️ SaaS

🏛️ Government

DevSecOps Engineer leading customer onboarding to the Game Warden platform for national security. Working in a collaborative environment to enhance secure deployments for government and defense.

🇺🇸 United States – Remote

💵 $135k - $160k / year

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

October 31

RTX

10,000+ employees

🚀 Aerospace

AI Cloud Engineer at Raytheon Technologies leading design and optimization of scalable AI solutions on cloud platforms. Collaborating with teams to drive innovation and support mission objectives.

🇺🇸 United States – Remote

💵 $124k - $250k / year

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

October 29

DDN

1001 - 5000

🤖 Artificial Intelligence

Director of DevOps and Product Security at DDN leading operational excellence across Infinia platform. Ensuring security and compliance while driving automation and scalability for AI workloads.

🇺🇸 United States – Remote

💰 $10M Funding Round on 2011-06

⏰ Full Time

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

October 29

Creyos (formerly Cambridge Brain Sciences)

51 - 200

⚕️ Healthcare Insurance

☁️ SaaS

🔬 Science

DevOps Engineer focusing on software development efficiency and reliability at Creyos. Joining a diverse team to innovate healthtech solutions with automated processes.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

🔴 Lead

⛑ DevOps & Site Reliability Engineer (SRE)

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com