Principal Observability, Reliability Architect

Job not on LinkedIn

🔥 2 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Thinkahead Consultant Psychologist Pty Ltd

Thinkahead Consultant Psychologist Pty Ltd

1 - 10 employees

Thinkahead is a privately owned psychology firm working across both the clinical field of private practice as well as corporate consulting space.

📋 Description

• Lead client discovery, architecture workshops, and solution design across observability, telemetry, reliability, and operational intelligence initiatives. • Design enterprise observability architectures spanning monitoring, logging, metrics, tracing, telemetry pipelines, alerting, event correlation, service visibility, and platform integrations. • Define scalable standards for telemetry onboarding, naming, tagging, RBAC, service ownership, dashboards, alert governance, runbooks, and operational handoff. • Advise on telemetry governance, including data quality, retention, access control, sampling, cardinality, and cost optimization. • Lead modernization initiatives including tool rationalization, dashboard and alert rationalization, telemetry strategy, and migration from legacy monitoring platforms. • Guide SRE practices including SLIs, SLOs, error budgets, production readiness, and incident response maturity. • Design integration patterns across ITSM, CMDB, event management, and automation platforms. • Support pursuits by shaping solution strategy, validating scope, informing estimates, and building client-facing technical narratives. • Serve as a senior escalation point and provide architecture governance during delivery. • Build reusable reference architectures, playbooks, and accelerators while mentoring architects, consultants, and offshore teams.

🎯 Requirements

• 10+ years in observability, monitoring, APM, platform operations, SRE, or related enterprise technology domains, including 5+ years leading architecture and delivery strategy for enterprise observability or reliability initiatives. • Deep, hands-on experience designing and implementing across monitoring, logging, metrics, tracing, telemetry collection, and pipeline patterns in hybrid and multi-cloud environments. • Strong knowledge of telemetry governance, including routing, transformation, normalization, enrichment, retention, access control, and cost management. • Experience defining enterprise standards for dashboards, alerts, tagging, naming, service ownership, RBAC, and operating model adoption. • Strong command of incident response, event correlation, alert strategy, service health, and business-service visibility, plus applied SRE concepts including SLIs, SLOs, error budgets, and production readiness. • Ability to lead executive and technical workshops and translate business needs into actionable architecture and delivery plans. • Consulting or professional services experience with strong client-facing communication, estimation, risk management, and cross-functional leadership.

🏖️ Benefits

• Medical, Dental, and Vision Insurance • 401(k) • Paid company holidays • Paid time off • Paid parental and caregiver leave • Plus more! See benefits https://www.aheadbenefits.com/ for additional details.

Apply Now

Similar Jobs

🔥 7 hours ago

Palo Alto Networks

10,000+ employees

🔒 Cybersecurity

🏢 Enterprise

Principal Architect driving clients’ cybersecurity transformation strategies through consulting engagements. Leading cross-functional teams to deliver security outcomes with cutting-edge technology solutions.

🔥 15 hours ago

NBCUniversal

10,000+ employees

📱 Media

Principal SAP/SAC Architect developing analytics solutions for NBCUniversal Corporate technology. Responsible for requirements collection, design, build of analytics solutions, and user training.

🕒 Yesterday

Gainwell Technologies

10,000+ employees

⚕️ Healthcare Insurance

Experienced Rebate Application Architect utilizing leading-edge technology to improve healthcare solutions. Collaborating with teams to support drug rebate solution development for state clients.

🕒 Yesterday

PeerIslands

11 - 50

🤖 Artificial Intelligence

☁️ SaaS

🏢 Enterprise

Principal Architect leading design and delivery of AI-first distributed systems. Collaborating with teams to drive architectural strategy and implement intelligent solutions.

🕒 2 days ago

Presidio

1001 - 5000

🤝 B2B

🤖 Artificial Intelligence

🔒 Cybersecurity

Data & Analytics Architect at Presidio designing data solutions for data-driven decision-making. Empowering clients through innovation, automation, and intelligent insights across multiple projects.