
1001 - 5000 employees
Founded 2007
☁️ SaaS
🤖 Artificial Intelligence
🤝 B2B
💰 Post-IPO Debt on 2024-09
SaaS • Artificial Intelligence • B2B
Zeta Global is an AI-powered marketing cloud that leverages proprietary AI and trillions of consumer signals to acquire, grow, and retain customers more efficiently. The Zeta Marketing Platform (ZMP) offers a comprehensive suite of tools, including data management, customer data platforms (CDP), email service providers (ESP), and digital signal processing (DSP), to create individualized customer experiences and improve marketing outcomes. Zeta emphasizes omnichannel marketing, customer intelligence, and data-driven marketing strategies, partnering with brands, agencies, and publishers worldwide to accelerate brand growth and engagement. Their platform is designed to tackle complex marketing challenges with solutions for customer acquisition, growth, and retention through predictive AI and actionable consumer data.
🕒 May 8
🇺🇸 United States – Remote
💵 $180k - $210k / year
⏰ Full Time
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor
Apache
AWS
DNS
Docker
DynamoDB
EC2
Grafana
Java
JavaScript
Kafka
Kubernetes
MySQL
Node.js
Prometheus
Python
React
Ruby
TCP/IP
Terraform
Improve your chances of getting an interview by checking your resume score before you apply.

1001 - 5000 employees
Founded 2007
☁️ SaaS
🤖 Artificial Intelligence
🤝 B2B
💰 Post-IPO Debt on 2024-09
SaaS • Artificial Intelligence • B2B
Zeta Global is an AI-powered marketing cloud that leverages proprietary AI and trillions of consumer signals to acquire, grow, and retain customers more efficiently. The Zeta Marketing Platform (ZMP) offers a comprehensive suite of tools, including data management, customer data platforms (CDP), email service providers (ESP), and digital signal processing (DSP), to create individualized customer experiences and improve marketing outcomes. Zeta emphasizes omnichannel marketing, customer intelligence, and data-driven marketing strategies, partnering with brands, agencies, and publishers worldwide to accelerate brand growth and engagement. Their platform is designed to tackle complex marketing challenges with solutions for customer acquisition, growth, and retention through predictive AI and actionable consumer data.
• Design, build, and operate production-grade CI/CD pipelines enabling multiple developers on multiple teams to deploy concurrently to production, multiple times daily, with zero-downtime guarantees. • Implement and optimize advanced deployment strategies including canary releases, blue/green deployments, rolling updates, incremental rollouts, and feature flag-gated releases via Statsig. • Build self-service deployment tooling that empowers developers to own their release process while enforcing safety guardrails, automated rollback triggers, and automate compliance gates. • Establish deployment observability with real-time canary analysis, automated health scoring, and progressive delivery metrics integrated with Grafana, Prometheus, and Honeycomb. • Champion CI/CD workflows using GitLab CI/CD, Helm charts, and Terraform to ensure infrastructure and application deployments are version-controlled, auditable, and reproducible. • Define and enforce SLOs/SLIs/SLAs across services, establishing error budgets that balance velocity with reliability. • Lead incident response processes, including on-call rotations, runbook development, blameless postmortems, and incident command structure. • Design and implement robust observability stacks leveraging Grafana, Prometheus, Loki, and Honeycomb for metrics, logging, tracing, and alerting at scale. • Proactively identify and eliminate reliability risks through chaos engineering, load testing, capacity planning, and failure mode analysis. • Reduce operational toil through automation, self-healing infrastructure patterns, and intelligent alerting to minimize mean time to detection (MTTD) and recovery (MTTR). • Manage and optimize AWS infrastructure spanning EC2, SQS, DynamoDB, and related services with Infrastructure as Code (Terraform) best practices. • Design and operate Kafka-based event streaming infrastructure for high-throughput, low-latency data pipelines supporting real-time marketing and analytics workloads. • Ensure robust networking across the platform, including DNS management, service mesh configuration, load balancing, TCP/IP optimization, routing policies, and VPC architecture. • Manage containerization strategy using Docker, ensuring efficient image builds, vulnerability scanning, registry management, and runtime security. • Support data infrastructure operations across Snowflake, MySQL, and other database platforms, collaborating with data engineering teams on reliability and performance.
• 10+ years of progressive experience in DevOps, SRE, Platform Engineering, or Infrastructure Engineering roles, with demonstrated impact at staff or principal level. • Expert-level Kubernetes knowledge, including cluster administration, Helm chart authoring, custom controllers/operators, network policies, RBAC, and multi-cluster management on AWS EKS. • Deep expertise in CI/CD pipeline architecture and advanced deployment strategies (canary, blue/green, progressive delivery, feature flag integration) at scale. • Strong proficiency with Infrastructure as Code using Terraform, including module design, state management, and multi-environment orchestration. • Expert knowledge of Docker containerization, including multi-stage builds, security hardening, image optimization, and container runtime management. • Production experience with Apache Kafka, including cluster management, topic design, consumer group strategies, and operational monitoring for high-throughput streaming workloads. • Strong networking fundamentals: DNS (Route 53, internal DNS), TCP/IP, routing, API Gateway, load balancing (ALB/NLB), service mesh, VPC peering, transit gateways, and network troubleshooting. • Extensive AWS experience spanning EKS, EC2, SQS, DynamoDB, IAM, VPC, CloudWatch, and related services in production environments. • Hands-on experience with observability platforms: Grafana (dashboards, alerting), Prometheus (metrics, PromQL), Loki (log aggregation), and Honeycomb (distributed tracing, BubbleUp analysis). • Working familiarity with multiple language stacks including Node.js, React, Python, Java, and Ruby, sufficient to understand build systems, dependency management, and runtime characteristics. • Experience operating within regulated environments, with practical knowledge of GDPR, CCPA, SOC 2, and compliance automation in MarTech or AdTech domains. • Proven ability to influence engineering culture, drive adoption of new practices, and communicate complex technical strategies clearly to both technical and non-technical stakeholders. • Demonstrated experience with GitLab CI/CD pipelines, including advanced pipeline features such as parent-child pipelines, dynamic environments, and security scanning integration.
• Unlimited PTO • Excellent medical, dental, and vision coverage • Employee Equity • Employee Discounts, Virtual Wellness Classes, and Pet Insurance And more!!
Apply Now🕒 May 7
Staff Database Reliability Engineer managing data infrastructure and leading database initiatives at Scribe. Ensuring operational excellence and driving observability across database systems.
🇺🇸 United States – Remote
💵 $225k - $250k / year
⏰ Full Time
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
Amazon Redshift
AWS
BigQuery
Django
Kafka
Postgres
Python
RabbitMQ
Redis
SQL
Terraform
Go
🕒 May 4
Site Reliability Engineer at NVIDIA designing and maintaining large scale Kubernetes clusters. Ensuring system reliability and operational efficiency through automation and monitoring practices.
🇺🇸 United States – Remote
💵 $320k - $488.8k / year
⏰ Full Time
🟠 Senior
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor
Cloud
Distributed Systems
Kubernetes
Linux
Perl
Python
Ruby
Go
🕒 May 3
Staff Security Engineer leading DevSecOps within Corporate Security team at 1Password. Responsible for securing developer environments and overseeing GitHub security.
🇺🇸 United States – Remote
💵 $192k - $278k / year
💰 $620M Series C on 2022-01
⏰ Full Time
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
🦅 H1B Visa Sponsor
Python
Terraform
🕒 May 2
Staff DevOps Engineer responsible for leading and improving cloud infrastructure for VA services. Collaborating with stakeholders and mentoring team members in software engineering best practices.
🇺🇸 United States – Remote
💵 $120k - $135k / year
⏰ Full Time
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
Ansible
Terraform
🕒 May 2
Manager, DevOps responsible for software delivery practices and cloud platform oversight at NRMP. Leading release management and cross-functional team coordination in a complex environment.
🇺🇸 United States – Remote
💵 $157.6k - $173.7k / year
⏰ Full Time
🟠 Senior
🔴 Lead
⛑ DevOps & Site Reliability Engineer (SRE)
AWS
Cloud
SDLC