Principal Architect – Cloud and Observability

Job not on LinkedIn

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of CVS Health

CVS Health

10,000+ employees

Founded 1963

⚕️ Healthcare Insurance

🛒 Retail

🧘 Wellness

Healthcare Insurance • Retail • Wellness

CVS Health is a leading American healthcare company dedicated to improving health access and affordability. The company focuses on a comprehensive approach that includes health services, health insurance, and pharmacy benefits management. Through its subsidiaries, such as Aetna and CVS Caremark, CVS Health offers a range of services that facilitate wellness, condition management, and affordable prescription drug coverage. CVS Health operates neighborhood pharmacies, provides mail-order pharmacy services, and manages specialty medication programs, aiming to make healthcare convenient and accessible for everyone. Driven by a mission to connect people with essential care services, CVS Health is committed to fostering healthier communities and supporting the wellbeing of all individuals.

📋 Description

• Own the enterprise observability reference architecture covering metrics, logs, traces, and events across all environments (cloud and on-prem) • Drive the OpenTelemetry-first instrumentation strategy -- standard libraries, semantic conventions, collector topologies (DaemonSet, gateway, sidecar), and pipeline design • Build and operate telemetry pipelines on Grafana Mimir, Loki, and Tempo, including multi-tenant configurations, retention policies, and capacity planning • Define how we measure reliability: SLOs, SLIs, error budgets, and alerting frameworks -- consistently across all lines of business • Own the integration between observability tooling and incident management (ServiceNow ITOM, xMatters) • Drive telemetry schema standards to ensure teams emit data that is useful downstream, not just technically compliant • Build and maintain reference architectures for our hybrid footprint: OpenShift on-prem with KVM/libvirt and Dell PowerFlex storage, plus Azure, AWS, and GCP • Lead standards work around workload identity and federation using SPIFFE/SPIRE and cloud-native IAM patterns to move away from static secrets • Provide guidance on compute runtime selection -- containers vs. VMs vs. bare metal vs. serverless -- with a clear decision framework for teams • Push FinOps maturity forward by integrating cost data into the observability stack, establishing unit economics, and working toward open billing standards like FOCUS • Identify where AI/ML adds practical value in our observability stack -- anomaly detection, root cause analysis, log clustering, and smarter alerting • Define observability standards for AI-powered systems (agents, RAG pipelines) -- covering latency, token costs, model drift, and related signals • Ensure new AI-powered platforms are instrumented correctly from day one • Participate in cross-functional architecture working groups focused on observability and hybrid cloud standards • Publish architecture decision records and reference implementations that teams can actually use • Mentor architects and platform engineers; conduct architecture reviews to raise the bar across the org • Work with security and compliance on HIPAA, SOX, and PCI requirements as they apply to telemetry and cloud infrastructure • Represent CVS Health in vendor evaluations and stay connected to the open-source ecosystem (CNCF, OpenTelemetry, Grafana Labs)

🎯 Requirements

• 10+ years in infrastructure, cloud architecture, platform engineering, or SRE • 8+ years of architecture work in observability, cloud infrastructure, or both at a large enterprise • Solid experience with at least two of Azure, AWS, or GCP -- including networking, identity, compute, and storage • 5+ years with Kubernetes in production (OpenShift, EKS, AKS, or GKE) • 5+ years with OpenTelemetry or similar frameworks (collectors, SDKs, semantic conventions, pipeline design) • 5+ years with observability platforms: Grafana/Mimir/Loki/Tempo, Prometheus, Datadog, Splunk, Dynatrace, or comparable tools • Experience defining SLOs/SLIs and building alerting strategies at an organizational level • Proven track record writing architecture standards that other teams adopted and followed • Able to communicate clearly with both engineers and senior leadership.

🏖️ Benefits

• medical, dental, and vision coverage • paid time off • retirement savings options • wellness programs • comprehensive benefits package designed to support the physical, emotional, and financial well-being of colleagues and their families

Apply Now

Similar Jobs

🔥 5 hours ago

Scribe

51 - 200

☁️ SaaS

⚡ Productivity

🏢 Enterprise

Staff Cloud Networking Engineer responsible for cloud networking and architecture while ensuring scalability and connectivity. Leading network operations and setting technical direction across cloud providers.

🕒 4 days ago

Docker, Inc

51 - 200

Staff Software Engineer designing and building core systems for Docker’s cloud agentic platform. Focusing on scalable and secure infrastructure for developers deploying agentic workloads efficiently.

🇺🇸 United States – Remote

💵 $170.3k - $275.6k / year

💰 $105M Series C on 2022-03

⏰ Full Time

🔴 Lead

☁️ Cloud Engineer

🕒 4 days ago

Vercel

201 - 500

☁️ SaaS

🌐 Web 3

Security Engineer strengthening platform security by designing scalable controls. Collaborating with teams to ensure secure deployment and infrastructure at Vercel.

🕒 4 days ago

Hanger, Inc.

5001 - 10000

⚕️ Healthcare Insurance

🧬 Biotechnology

🧘 Wellness

Senior technical resource for IT Systems Cloud Engineering at Hanger, Inc. leading technology solution implementation with a focus on customer service and IT operations.

🕒 5 days ago

TELUS Digital

201 - 500

🤝 B2B

🤖 Artificial Intelligence

☁️ SaaS

Principal Consultant acting as Education Cloud Architect for Salesforce solutions in higher education organizations. Responsible for design, delivery, and strategic guidance across the Salesforce ecosystem.