Principal Architect – Cloud and Observability

Job not on LinkedIn

🔥 17 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of CVS Health

CVS Health

10,000+ employees

Founded 1963

⚕️ Healthcare Insurance

🛒 Retail

🧘 Wellness

Healthcare Insurance • Retail • Wellness

CVS Health is a leading American healthcare company dedicated to improving health access and affordability. The company focuses on a comprehensive approach that includes health services, health insurance, and pharmacy benefits management. Through its subsidiaries, such as Aetna and CVS Caremark, CVS Health offers a range of services that facilitate wellness, condition management, and affordable prescription drug coverage. CVS Health operates neighborhood pharmacies, provides mail-order pharmacy services, and manages specialty medication programs, aiming to make healthcare convenient and accessible for everyone. Driven by a mission to connect people with essential care services, CVS Health is committed to fostering healthier communities and supporting the wellbeing of all individuals.

📋 Description

• Own the enterprise observability reference architecture covering metrics, logs, traces, and events across all environments (cloud and on-prem) • Drive the OpenTelemetry-first instrumentation strategy -- standard libraries, semantic conventions, collector topologies (DaemonSet, gateway, sidecar), and pipeline design • Build and operate telemetry pipelines on Grafana Mimir, Loki, and Tempo, including multi-tenant configurations, retention policies, and capacity planning • Define how we measure reliability: SLOs, SLIs, error budgets, and alerting frameworks -- consistently across all lines of business • Own the integration between observability tooling and incident management (ServiceNow ITOM, xMatters) • Drive telemetry schema standards to ensure teams emit data that is useful downstream, not just technically compliant • Build and maintain reference architectures for our hybrid footprint: OpenShift on-prem with KVM/libvirt and Dell PowerFlex storage, plus Azure, AWS, and GCP • Lead standards work around workload identity and federation using SPIFFE/SPIRE and cloud-native IAM patterns to move away from static secrets • Provide guidance on compute runtime selection -- containers vs. VMs vs. bare metal vs. serverless -- with a clear decision framework for teams • Push FinOps maturity forward by integrating cost data into the observability stack, establishing unit economics, and working toward open billing standards like FOCUS • Identify where AI/ML adds practical value in our observability stack -- anomaly detection, root cause analysis, log clustering, and smarter alerting • Define observability standards for AI-powered systems (agents, RAG pipelines) -- covering latency, token costs, model drift, and related signals • Ensure new AI-powered platforms are instrumented correctly from day one • Participate in cross-functional architecture working groups focused on observability and hybrid cloud standards • Publish architecture decision records and reference implementations that teams can actually use • Mentor architects and platform engineers; conduct architecture reviews to raise the bar across the org • Work with security and compliance on HIPAA, SOX, and PCI requirements as they apply to telemetry and cloud infrastructure • Represent CVS Health in vendor evaluations and stay connected to the open-source ecosystem (CNCF, OpenTelemetry, Grafana Labs)

🎯 Requirements

• 10+ years in infrastructure, cloud architecture, platform engineering, or SRE • 8+ years of architecture work in observability, cloud infrastructure, or both at a large enterprise • Solid experience with at least two of Azure, AWS, or GCP -- including networking, identity, compute, and storage • 5+ years with Kubernetes in production (OpenShift, EKS, AKS, or GKE) • 5+ years with OpenTelemetry or similar frameworks (collectors, SDKs, semantic conventions, pipeline design) • 5+ years with observability platforms: Grafana/Mimir/Loki/Tempo, Prometheus, Datadog, Splunk, Dynatrace, or comparable tools • Experience defining SLOs/SLIs and building alerting strategies at an organizational level • Proven track record writing architecture standards that other teams adopted and followed • Able to communicate clearly with both engineers and senior leadership.

🏖️ Benefits

• medical, dental, and vision coverage • paid time off • retirement savings options • wellness programs • comprehensive benefits package designed to support the physical, emotional, and financial well-being of colleagues and their families

Apply Now

Similar Jobs

🔥 5 hours ago

Scribe

51 - 200

☁️ SaaS

⚡ Productivity

🏢 Enterprise

Staff Cloud Networking Engineer responsible for cloud networking and architecture while ensuring scalability and connectivity. Leading network operations and setting technical direction across cloud providers.

AWS

Cloud

DNS

Kubernetes

TCP/IP

Terraform

🕒 4 days ago

Docker, Inc

51 - 200

Staff Software Engineer designing and building core systems for Docker’s cloud agentic platform. Focusing on scalable and secure infrastructure for developers deploying agentic workloads efficiently.

🇺🇸 United States – Remote

💵 $170.3k - $275.6k / year

💰 $105M Series C on 2022-03

⏰ Full Time

🔴 Lead

☁️ Cloud Engineer

AWS

Azure

Cloud

Distributed Systems

Docker

Google Cloud Platform

Java

Kubernetes

Microservices

Go

🕒 4 days ago

Vercel

201 - 500

☁️ SaaS

🌐 Web 3

Security Engineer strengthening platform security by designing scalable controls. Collaborating with teams to ensure secure deployment and infrastructure at Vercel.

AWS

Cloud

Google Cloud Platform

Kubernetes

Terraform

🕒 5 days ago

Hanger, Inc.

5001 - 10000

⚕️ Healthcare Insurance

🧬 Biotechnology

🧘 Wellness

Senior technical resource for IT Systems Cloud Engineering at Hanger, Inc. leading technology solution implementation with a focus on customer service and IT operations.

AWS

Azure

Cloud

Linux

Terraform

🕒 5 days ago

TELUS Digital

201 - 500

🤝 B2B

🤖 Artificial Intelligence

☁️ SaaS

Principal Consultant acting as Education Cloud Architect for Salesforce solutions in higher education organizations. Responsible for design, delivery, and strategic guidance across the Salesforce ecosystem.

Cloud