Senior Site Reliability Engineer

Job not on LinkedIn

July 3

Apply Now
Logo of MetaRouter

MetaRouter

SaaS • Enterprise • Marketing

MetaRouter is a leading technology company that specializes in server-side tag management to enhance data collection, privacy compliance, and marketing efficiency. Their platform allows for first-party data collection and management, providing marketers and developers with tools to boost marketing ROI, data governance, and customer experience. With features like Sync Injector™, MetaRouter offers solutions for role-based permissions, cross-domain identity, consent integration, and data replay. They help organizations tackle challenges related to cookie deprecation, data sovereignty, and enhanced analytics, enabling businesses to maintain data privacy, performance, and compliance.

11 - 50 employees

☁️ SaaS

🏢 Enterprise

📋 Description

• Architect the creation, maintenance, and removal of cloud infrastructure that supports our applications and internal operations. • Manage deployment of our applications on cloud infrastructure. • Manage upgrades of infrastructure and a wide variety of intermediate software that supports our applications. • Set up and maintain dashboards, logs, metrics, and alerting mechanisms, with a focus on creating alerts that provide high signal and low noise. • Continuously improve observability by enhancing logging, metrics, and tracing systems to provide deeper insights into system performance, reduce time to resolution, and support proactive incident detection. • Lead the investigation and resolution of complex infrastructure and application issues, identifying root causes, driving systemic fixes, and mentoring others in effective troubleshooting practices. • Ensure that cloud infrastructure and our applications meet or exceed compliance requirements. • Establish and drive standards for infrastructure and process documentation, ensuring clarity, consistency, and long-term maintainability across teams and systems. • Drive best practices through code reviews by setting high standards for infrastructure, application, and service reliability, while mentoring engineers and influencing architecture and deployment patterns across teams. • Work with customers to determine and implement custom infrastructure requirements in a way that balances flexibility with repeatable, scalable patterns. • Lead design and architectural decisions for infrastructure and applications, driving improvements in automation, performance, reliability, and security at scale. • Provide technical leadership and mentorship to SRE team members, fostering a culture of growth, ownership, and continuous learning. • Partner cross-functionally with platform engineering and other stakeholders to define and deliver scalable infrastructure solutions for internal and customer-facing systems. • Apply business and technical acumen to prioritize and guide engineering efforts that maximize impact in resource-constrained environments. • Champion a culture of continuous improvement by identifying and implementing strategic process, system, and collaboration enhancements across teams. • Proactively identify and address technical and procedural risks before they escalate, exercising sound judgment and autonomy to drive long-term resilience and operational excellence. • Lead by example in the on-call rotation, setting standards for incident response, postmortems, and systemic resiliency improvements. • Design and implement scalable playbooks and alerting systems that reduce Mean Time To Repair (MTTR) by enabling rapid, consistent, and effective incident response.

🎯 Requirements

• 8+ years of experience in SRE or DevOps roles, with a strong track record of owning and scaling infrastructure on at least one major cloud provider (preferably GCP). • Deep expertise in configuring, maintaining, and troubleshooting Kubernetes clusters in production environments, including cluster architecture, security, and performance tuning. • Advanced proficiency with infrastructure and automation tools such as Bash, CI/CD pipelines, Docker, Git, Helm, Prometheus, Terraform, and YAML, with the ability to evaluate and implement tooling at scale. • Demonstrated experience architecting and managing identity and access management (IAM) and single sign-on (SSO) across complex, multi-platform environments. • Operational expertise with observability platforms such as New Relic (including NRQL), using telemetry to guide performance optimization, reliability improvements, and incident response strategies. • Familiarity with the operational aspects of modern application stacks, including Go and React/Node.js, with the ability to collaborate effectively across application and infrastructure domains. • Strong understanding of agile methodologies, with experience leading infrastructure initiatives within iterative development cycles. • Proven ability to prioritize and execute across a diverse set of responsibilities in a fast-paced, evolving environment, balancing tactical needs with long-term technical strategy.

🏖️ Benefits

• Health/Dental/Vision/Insurance • 401(k) • Unlimited Vacation Policy • Fully Remote (US)

Apply Now

Similar Jobs

June 23

Extend

201 - 500

🛍️ eCommerce

🔌 API

🤝 B2B

Senior DevOps Engineer at Extend, focusing on infrastructure solutions for AI-driven platforms.

🇺🇸 United States – Remote

💵 $150k - $180k / year

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

June 18

WorkOS

11 - 50

☁️ SaaS

🔐 Security

🏢 Enterprise

Join WorkOS as a Site Reliability Engineer to enhance platform performance and reliability in a fully remote role.

🇺🇸 United States – Remote

💵 $175k - $250k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

June 17

Genmo

2 - 10

🤖 Artificial Intelligence

📱 Media

Join Genmo as a Senior Site Reliability Engineer to manage GPU infrastructure for AI development.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

June 11

Nava

201 - 500

🏛️ Government

🤝 B2B

☁️ SaaS

Nava seeks experienced infrastructure engineers for AWS systems management and improvement.

🇺🇸 United States – Remote

💵 $126k - $135.9k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

June 8

Enthuziastic

11 - 50

DevOps Trainers training learners worldwide in technologies like Docker, Kubernetes and CI/CD. Seeking individuals with extensive technical training experience.

🇺🇸 United States – Remote

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Developed by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com