Senior DevOps – Platform Reliability Engineer

🕒 il y a 1 mois

🗽 New York – Distant

info

⏰ Temps Plein

🟠 Senior

⛑ Ingénieur DevOps & SRE

🦅 Parrain de Visa H1B

info

🗣️🇺🇸🇬🇧 Anglais requis

Postuler Maintenant
Trouver des Emplois à Distance Similaires

📊 Vérifiez votre score de CV pour ce poste

Améliorez vos chances d'obtenir un entretien en vérifiant votre score de CV avant de postuler.

Logo of Zingtree

Zingtree

11 - 50 employés

🤝 B2B

☁️ SaaS

🤖 Intelligence artificielle

💰 €15 000 000 Series A en 2022-01

B2B • SaaS • Artificial Intelligence

Zingtree est une entreprise qui améliore le support client grâce à l'automatisation des processus basée sur l'IA, aidant les entreprises à rationaliser et simplifier les processus de support complexes. Elle offre des outils pour créer, gérer et automatiser les flux de travail du support, facilitant ainsi la résolution efficace des problèmes par les agents de service client et les clients. Zingtree s'intègre à divers systèmes CRM et supporte plusieurs industries, y compris les centres de contact, la santé, l'assurance et les services à domicile. Grâce à ses flux de travail dynamiques, son IA d'assistance à la rédaction et son automatisation de la conformité, elle aide les entreprises à améliorer leur expérience client avec des temps de résolution plus rapides et des contrôles de conformité renforcés.

Description

• Own and evolve CI/CD pipelines using GitHub Actions and OIDC-based authentication for microservices and agentic workloads, with safe, fast, and reversible deployments. • Automate infrastructure provisioning using Infrastructure as Code (IaC) tools such as Terraform and CloudFormation. • Operate and scale our Kubernetes platform (EKS + Argo CD), including autoscaling, ingress, external-dns, cert-manager, External Secrets Operator, backups, runtime guardrails, and multi-tenant isolation for enterprise customers. • Manage the edge and network perimeter, including Cloudflare (CDN, WAF, Bot Management, DDoS protection, Zero Trust / Access), CloudFront, API Gateway, ALB/NLB, Route 53, and network security controls. • Operate the data and event tier, including Aurora MySQL, ElastiCache/Redis, S3, and MSK (Kafka), with responsibility for backups, point-in-time recovery (PITR), and multi-AZ disaster recovery aligned to defined RTO/RPO objectives. • Build and maintain Lambda workloads where event-driven or serverless architectures are the right fit. • Build observability as a product using Prometheus, Grafana, and OpenTelemetry, including telemetry for LLM and agentic systems such as token cost, tool-call latency, evaluation signals, and prompt/version tracking. • Strengthen our security and compliance posture for SOC 2 and HIPAA, including least-privilege IAM, SCPs, secrets management, SAST/DAST, dependency and container scanning, image signing, AWS Config, Security Hub, GuardDuty, Inspector, and evidence automation. • Drive FinOps initiatives, including tagging standards, Savings Plans and Reserved Instances, per-tenant and per-workload cost attribution, and LLM cost controls. • Build and evolve our AI-native DevOps capabilities.

🎯 Exigences

• 5+ years of experience in DevOps, SRE, or Platform Engineering operating production systems on AWS. • Strong experience with CI/CD pipelines and tools such as GitHub Actions, GitLab CI, Jenkins, or CircleCI. • Hands-on experience operating production EKS environments, including autoscaling, ingress, secrets management, and cluster upgrades. • Strong AWS networking experience, including multi-account VPC design, subnets, routing, security groups, NACLs, Route 53, ACM, and load balancers. • Deep experience with Terraform and GitHub Actions, ideally using OIDC-based cloud authentication. • Experience with Aurora/RDS MySQL, Redis (ElastiCache), and S3, including backups, PITR, migrations, and lifecycle management. • Strong observability experience using Prometheus, Grafana, and OpenTelemetry. • Experience operating Argo CD at scale. • Experience with Infrastructure as Code tools such as Terraform, CloudFormation, or Ansible. • Experience managing Cloudflare services including WAF, Bot Management, Rate Limiting, and Zero Trust / Access, along with CloudFront. • Experience operating Kafka/MSK at scale, including topics, consumer groups, and schema registries. • Experience with Lambda and event-driven architectures. • Comfortable working with Python, Bash, and Linux systems. • Strong understanding of security best practices across IAM, KMS, secrets management, networking, and software supply chain security. • Familiarity with vulnerability scanning and compliance tooling.

🏖️ Avantages

• Competitive compensation packages • Comprehensive health benefits: • 100% of employee premiums covered • 75%–80% of dependent premiums covered for most health, dental, and vision plans • 401(k) plans to support retirement planning (no employer matching currently) • Paid parental leave • Unlimited PTO • Flexible remote work from anywhere • Up to $200/month co-working reimbursement • Home office stipend: • Up to $500 for home office setup • $100/month for internet, phone, and related expenses

Postuler Maintenant

Emplois Similaires

🕒 il y a 1 mois

Flywire

1001 - 5000

💸 Finance

💳 Fintech

Manager II, Site Reliability Engineering at Flywire driving infrastructure reliability and performance. Leading SRE teams for global cloud-based systems and initiatives for production excellence.

🇺🇸 États-Unis – Télétravail

💵 $160 000 - $200 000 / an

💰 €60 000 000 Series F en 2021-03

⏰ Temps Plein

🟡 Intermédiaire

🟠 Senior

⛑ Ingénieur DevOps & SRE

🦅 Parrain de Visa H1B

info

🗣️🇺🇸🇬🇧 Anglais requis

🕒 il y a 1 mois

Group 1001

501 - 1000

💸 Finance

📚 Éducation

Senior Network Reliability Engineer in the Platform Engineering Services team focusing on site reliability and network platform engineering at a consumer-centric insurance company.

🗣️🇺🇸🇬🇧 Anglais requis

🕒 il y a 1 mois

Hotel Engine

201 - 500

🛍️ eCommerce

🚗 Transport

Senior Software Engineer on the Control Plane team at Engine managing production cloud infrastructure. Leading technical direction and mentoring engineers for optimized performance.

🇺🇸 États-Unis – Télétravail

💵 $121 400 - $168 000 / an

💰 €65 000 000 Series B en 2021-12

⏰ Temps Plein

🟠 Senior

⛑ Ingénieur DevOps & SRE

🗣️🇺🇸🇬🇧 Anglais requis

🕒 il y a 1 mois

O'Reilly

201 - 500

📚 Éducation

☁️ SaaS

🤖 Intelligence artificielle

Cloud Operations Engineer driving infrastructure and tooling for O'Reilly's learning platform. Managing Kubernetes, Terraform, and developer tooling to enhance internal processes.

🇺🇸 États-Unis – Télétravail

💵 $128 000 - $174 000 / an

⏰ Temps Plein

🟡 Intermédiaire

🟠 Senior

⛑ Ingénieur DevOps & SRE

🗣️🇺🇸🇬🇧 Anglais requis

🕒 il y a 1 mois

NextGen IT Services

51 - 200

🤝 B2B

🏢 Entreprise

🎯 Recrutement

DevOps Engineer at NextGen IT Services responsible for building/maintaining CI/CD pipelines and cloud infrastructure. Focusing on operational efficiencies and security controls implementation.

🇺🇸 États-Unis – Télétravail

⏰ Temps Plein

🟡 Intermédiaire

🟠 Senior

⛑ Ingénieur DevOps & SRE

🗣️🇺🇸🇬🇧 Anglais requis