Senior DevOps – Platform Reliability Engineer

🕒 vor 1 Monat

🗽 New York – Remote

info

⏰ Vollzeit

🟠 Senior

⛑ DevOps- und Site Reliability Engineer (SRE)

🦅 H1B-Visum-Sponsor

info

🗣️🇺🇸🇬🇧 Englisch erforderlich

Jetzt Bewerben
Ähnliche Remote-Jobs finden

📊 Überprüfen Sie Ihre Lebenslauf-Bewertung für diese Stelle

Verbessern Sie Ihre Chancen auf ein Vorstellungsgespräch, indem Sie Ihre Lebenslauf-Bewertung vor der Bewerbung überprüfen.

Logo of Zingtree

Zingtree

11 - 50 Mitarbeiter

🤝 B2B

☁️ SaaS

🤖 Künstliche Intelligenz

💰 €15.000.000 Series A im 2022-01

B2B • SaaS • Artificial Intelligence

Zingtree ist ein Unternehmen, das den Kundensupport durch KI-Prozessautomatisierung optimiert und es Unternehmen ermöglicht, komplexe Supportprozesse zu straffen und zu vereinfachen. Es bietet Tools zur Erstellung, Verwaltung und Automatisierung von Support-Workflows, wodurch es für Kundendienstmitarbeiter und Kunden einfacher wird, Probleme effizient zu lösen. Zingtree integriert sich in verschiedene CRM-Systeme und unterstützt mehrere Branchen, einschließlich Contact Center, Gesundheitswesen, Versicherungen und Heimdienstleistungen. Durch seine dynamischen Workflows, Author Assist AI und Compliance-Automatisierung hilft es Unternehmen, die Kundenerfahrung mit schnelleren Lösungszeiten und verbesserten Compliance-Kontrollen zu verbessern.

Beschreibung

• Own and evolve CI/CD pipelines using GitHub Actions and OIDC-based authentication for microservices and agentic workloads, with safe, fast, and reversible deployments. • Automate infrastructure provisioning using Infrastructure as Code (IaC) tools such as Terraform and CloudFormation. • Operate and scale our Kubernetes platform (EKS + Argo CD), including autoscaling, ingress, external-dns, cert-manager, External Secrets Operator, backups, runtime guardrails, and multi-tenant isolation for enterprise customers. • Manage the edge and network perimeter, including Cloudflare (CDN, WAF, Bot Management, DDoS protection, Zero Trust / Access), CloudFront, API Gateway, ALB/NLB, Route 53, and network security controls. • Operate the data and event tier, including Aurora MySQL, ElastiCache/Redis, S3, and MSK (Kafka), with responsibility for backups, point-in-time recovery (PITR), and multi-AZ disaster recovery aligned to defined RTO/RPO objectives. • Build and maintain Lambda workloads where event-driven or serverless architectures are the right fit. • Build observability as a product using Prometheus, Grafana, and OpenTelemetry, including telemetry for LLM and agentic systems such as token cost, tool-call latency, evaluation signals, and prompt/version tracking. • Strengthen our security and compliance posture for SOC 2 and HIPAA, including least-privilege IAM, SCPs, secrets management, SAST/DAST, dependency and container scanning, image signing, AWS Config, Security Hub, GuardDuty, Inspector, and evidence automation. • Drive FinOps initiatives, including tagging standards, Savings Plans and Reserved Instances, per-tenant and per-workload cost attribution, and LLM cost controls. • Build and evolve our AI-native DevOps capabilities.

🎯 Anforderungen

• 5+ years of experience in DevOps, SRE, or Platform Engineering operating production systems on AWS. • Strong experience with CI/CD pipelines and tools such as GitHub Actions, GitLab CI, Jenkins, or CircleCI. • Hands-on experience operating production EKS environments, including autoscaling, ingress, secrets management, and cluster upgrades. • Strong AWS networking experience, including multi-account VPC design, subnets, routing, security groups, NACLs, Route 53, ACM, and load balancers. • Deep experience with Terraform and GitHub Actions, ideally using OIDC-based cloud authentication. • Experience with Aurora/RDS MySQL, Redis (ElastiCache), and S3, including backups, PITR, migrations, and lifecycle management. • Strong observability experience using Prometheus, Grafana, and OpenTelemetry. • Experience operating Argo CD at scale. • Experience with Infrastructure as Code tools such as Terraform, CloudFormation, or Ansible. • Experience managing Cloudflare services including WAF, Bot Management, Rate Limiting, and Zero Trust / Access, along with CloudFront. • Experience operating Kafka/MSK at scale, including topics, consumer groups, and schema registries. • Experience with Lambda and event-driven architectures. • Comfortable working with Python, Bash, and Linux systems. • Strong understanding of security best practices across IAM, KMS, secrets management, networking, and software supply chain security. • Familiarity with vulnerability scanning and compliance tooling.

🏖️ Vorteile

• Competitive compensation packages • Comprehensive health benefits: • 100% of employee premiums covered • 75%–80% of dependent premiums covered for most health, dental, and vision plans • 401(k) plans to support retirement planning (no employer matching currently) • Paid parental leave • Unlimited PTO • Flexible remote work from anywhere • Up to $200/month co-working reimbursement • Home office stipend: • Up to $500 for home office setup • $100/month for internet, phone, and related expenses

Jetzt Bewerben

Ähnliche Jobs

🕒 vor 1 Monat

Flywire

1001 - 5000

💸 Finanzen

💳 Fintech

Manager II, Site Reliability Engineering at Flywire driving infrastructure reliability and performance. Leading SRE teams for global cloud-based systems and initiatives for production excellence.

🇺🇸 Vereinigte Staaten – Remote

💵 $160.000 - $200.000 / Jahr

💰 €60.000.000 Series F im 2021-03

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

⛑ DevOps- und Site Reliability Engineer (SRE)

🦅 H1B-Visum-Sponsor

info

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 1 Monat

Group 1001

501 - 1000

💸 Finanzen

📚 Bildung

Senior Network Reliability Engineer in the Platform Engineering Services team focusing on site reliability and network platform engineering at a consumer-centric insurance company.

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 1 Monat

Hotel Engine

201 - 500

🛍️ eCommerce

🚗 Transport

Senior Software Engineer on the Control Plane team at Engine managing production cloud infrastructure. Leading technical direction and mentoring engineers for optimized performance.

🇺🇸 Vereinigte Staaten – Remote

💵 $121.400 - $168.000 / Jahr

💰 €65.000.000 Series B im 2021-12

⏰ Vollzeit

🟠 Senior

⛑ DevOps- und Site Reliability Engineer (SRE)

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 1 Monat

O'Reilly

201 - 500

📚 Bildung

☁️ SaaS

🤖 Künstliche Intelligenz

Cloud Operations Engineer driving infrastructure and tooling for O'Reilly's learning platform. Managing Kubernetes, Terraform, and developer tooling to enhance internal processes.

🇺🇸 Vereinigte Staaten – Remote

💵 $128.000 - $174.000 / Jahr

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

⛑ DevOps- und Site Reliability Engineer (SRE)

🗣️🇺🇸🇬🇧 Englisch erforderlich

🕒 vor 1 Monat

The Home Depot

10.000+ Mitarbeiter

🛒 Einzelhandel

👥 B2C

Senior Software Engineer ensuring the stability and performance of platforms at Home Depot. Collaborating to enhance product reliability and mentoring junior engineers in operational excellence.

🇺🇸 Vereinigte Staaten – Remote

💵 $80.000 - $180.000 / Jahr

💰 Debt Financing im 2007-07

⏰ Vollzeit

🟠 Senior

⛑ DevOps- und Site Reliability Engineer (SRE)

🗣️🇺🇸🇬🇧 Englisch erforderlich