Senior Platform Engineer

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Flexential

Flexential

501 - 1000 employees

Founded 2000

🤝 B2B

📡 Telecommunications

🏢 Enterprise

B2B • Telecommunications • Enterprise

Flexential is a provider of purpose-built data center colocation, interconnection, cloud, and data protection services delivered through its FlexAnywhere® platform. The company operates 40+ highly connected data centers across 18 U. S. markets, offering high-density power, liquid-ready cooling, carrier and cloud interconnection, managed and professional services to support hybrid IT, AI/ML workloads, disaster recovery, and compliance needs for enterprise customers.

📋 Description

• Design, develop and operationally manage automated, resilient, high availability, self-healing, secure platforms with native-AI capabilities for IT needs, serving both internal as well as customer business capabilities • Develop , and manage the Observability OpenTelemetry Central Backend Stack: Grafana Enterprise, Mimir, Loki, Tempo, and Alertmanager on Kubernetes/RKE2 via Helm and GitLab CI -CD . • Build and manage iaC and CI-CD for automated provisioning and deployment, including Terraform modules for Infra/ VM/storage provisioning, Ansible AWX playbooks for OS/ App bootstrap, ArgoCD and Helm for Kubernetes configuration. • Develop and manage OpenTelemetry Prometheus scrape profile library including SNMP exporters, REST API exporters, and cloud provider exporters (CloudWatch, Azure Monitor, GCP) for multiple device classes. • Develop AIOps capabilities on platforms for e.g Observability use-cases : anomaly detection integrations, event correlation rules in Alertmanager , and synthetic monitoring patterns to reduce alert noise. • Configure and maintain Zabbix auto-discovery: network range scanning, device classification, and Prometheus service discovery integration. • Build and harden Edge Stack deployments (Prometheus + OTel collector) per data center site using GitOps templates. • Integrate Alertmanager with ServiceNow: webhook routing, ticket enrichment, auto-close logic, and escalation policy configuration. • Maintain platform security: Conjur /CyberArk secret injection at runtime, mTLS between stack components, RBAC in Grafana Enterprise. • Author and maintain Grafana dashboards in JSON/GitLab — facility overview, network health, RED metrics, application telemetry. • Mentor mid-level engineers, lead code reviews, and establish engineering standards for the team. • Represent platform engineering in cross-functional architecture reviews and executive-level program updates. • Perform other duties as required and assigned

🎯 Requirements

• DevOps / Automation - 5+ years in a production environment • Kubernetes (RKE2/k3s), Helm chart deployment, system services, Docker/ container • LGTM Stack Development and Configuration - 4 + years : Grafana, Mimir, Loki, Tempo configuration, tuning, dash- boarding and production operations ; Prometheus required • Senior-level Python / Scripting frameworks - 5+ years, Automation scripts, exporter development, GitLab pipeline scripting, REST API integrations • GitOps / CI/CD - 5+ years, GitLab CI/CD pipeline authoring; Terraform and Ansible as primary IaC tools; ArgoCD or Flux preferred • AIOps / Observability Engineering - 2+ years , Alertmanager rule authoring, anomaly detection integration, event correlation, noise reduction techniques • Working infrastructure (Linux/VM) management knowledge - 5+ years, Linux administration, VMware vCenter/ VCF experience , Netapp storage management , network fundamentals (SNMP, TCP/IP) • Secrets Management - 2+ years , CyberArk/ Conjur , HashiCorp Vault, or equivalent — runtime secret injection patterns • Minimal travel may be required

🏖️ Benefits

• Medical, Telehealth, Dental and Vision • 401(k) • Health Savings Accounts (HSA) and Flexible Spending Accounts (FSA) • Life and AD&D • Short Term and Long-Term disability • Flex Paid Time Off (PTO) • Leave of Absence • Employee Assistance Program • Wellness Program • Rewards and Recognition Program

Apply Now

Similar Jobs

🔥 2 hours ago

Derex Technologies Inc

51 - 200

🏢 Enterprise

☁️ SaaS

Dynamics Consultant and CRM/Power Platform Developer Lead at Derex Technologies Inc specializing in IT consulting and staffing solutions. Working on projects to deliver Dynamics CRM/365 solutions and integration.

🔥 3 hours ago

Brown and Caldwell

1001 - 5000

🤝 B2B

🔬 Science

Data & AI Platform Engineer enhancing cloud platform and AI infrastructure at Brown and Caldwell. Collaborating with teams to design and maintain data pipelines and AI-enabled solutions.

🔥 13 hours ago

Defense Unicorns

51 - 200

🔒 Cybersecurity

Platform Engineer at Defense Unicorns focusing on UDS deployments across AWS and Azure. Responsibilities include customer training and support along with system architecture development.

🇺🇸 United States – Remote

💵 $123.3k - $166.8k / year

💰 Seed Round on 2022-10

⏰ Full Time

🟡 Mid-level

🟠 Senior

🏗️ Platform Engineer

🔥 13 hours ago

Defense Unicorns

51 - 200

🔒 Cybersecurity

Platform Engineer at Defense Unicorns managing Kubernetes proficiency and UDS deployments while training customers on software tools.

🇺🇸 United States – Remote

💵 $123.3k - $166.8k / year

💰 Seed Round on 2022-10

⏰ Full Time

🟡 Mid-level

🟠 Senior

🏗️ Platform Engineer

🔥 15 hours ago

Kerr Dental

1001 - 5000

⚕️ Healthcare Insurance

🔬 Science

🧘 Wellness

Executive Director leading AI platform engineering for Novartis, driving AI transformation through data and advanced analytics. Leading engineering strategy, delivery, and operational excellence for agentic AI platform.

🇺🇸 United States – Remote

💵 $225.4k - $418.6k / year

💰 Debt Financing on 2005-12

⏰ Full Time

🟠 Senior

🏗️ Platform Engineer