Data Center Reliability Engineer

Ähnliche Remote-Jobs finden

51 - 200 Mitarbeiter

🤖 Künstliche Intelligenz

⚡ Energie

☁️ SaaS

Artificial Intelligence • Energy • SaaS

Phaidra ist ein Unternehmen, das künstliche Intelligenz-Steuerungen bereitstellt, um geschäftskritische Einrichtungen wie Rechenzentren und Industrieanlagen zu optimieren. Ihr KI-Steuerungsservice im geschlossenen Regelkreis verbessert die Stabilität von Anlagen, die Energieeffizienz und die Nachhaltigkeit durch Reduzierung von Ausfallzeiten, Steigerung der Produktivität und Senkung der CO2-Emissionen. Anders als traditionelle Steuerungssysteme lernen und verbessern sich die KI-getriebenen Steuerungen von Phaidra kontinuierlich im Laufe der Zeit, ohne dass neue Hardware erforderlich ist. Das System bietet eine Echtzeit-Optimierung und integriert sich in bestehende Steuerungssysteme, wodurch Sicherheit und Betriebsstabilität verbessert und gleichzeitig volle Transparenz der Leistungsdaten bereitgestellt werden. Phaidra nutzt modernste Deep-Reinforcement-Learning-Techniken, um außergewöhnliche Ergebnisse bei einigen der weltweit größten Herausforderungen zu erzielen, einschließlich erheblicher Energieeinsparungen in den Rechenzentren von Google.

Data Center Reliability Engineer

🕒 vor 11 Tagen

☕ Washington – Remote

💵 $101.320 - $163.900 / Jahr

⏰ Vollzeit

🟢 Junior

🟡 Mittelstufe

⛑ DevOps- und Site Reliability Engineer (SRE)

🗣️🇺🇸🇬🇧 Englisch erforderlich

Numpy

Pandas

Python

Jetzt Bewerben

📊 Überprüfen Sie Ihre Lebenslauf-Bewertung für diese Stelle

Verbessern Sie Ihre Chancen auf ein Vorstellungsgespräch, indem Sie Ihre Lebenslauf-Bewertung vor der Bewerbung überprüfen.

Phaidra

51 - 200 Mitarbeiter

🤖 Künstliche Intelligenz

⚡ Energie

☁️ SaaS

Artificial Intelligence • Energy • SaaS

Beschreibung

• Utilize existing data ingestion and delivery platforms to "teach" models to understand the physical world, filling a critical expertise gap in the data center industry. • Use telemetry tools to analyze sensor data across mechanical (chillers, pumps) and electrical (UPS, switchgear, power feeds) systems to identify "failure signatures" for LLM-driven monitoring tool. • Act as a primary user of platforms, identifying gaps in current mechanisms and collaborating with Engineering to influence future features and data quality. • Translate raw telemetry into "SME-level" logic and directions used by the LLM tool to guide data center operators in real-time. • Cultivate deep domain expertise in all facets of data center infrastructure. • Move from shadowing peers to directly supporting customers, using the platform to provide clear, data-backed direction on complex problems. • Oversee pilot projects to test how AI-driven SME tool interprets real-world stressors, ensuring the output is operationally realistic, accurate, and actionable. • Remain agile and proactive in a fast-moving team environment.

🎯 Anforderungen

• 2–3 years of professional relevant experience • Bachelor’s degree in Mechanical Engineering, Electrical Engineering, Control Theory, or a related field that provides a foundation in physical systems and thermodynamics. • A deep, innate interest in using data to diagnose how and why systems fail. You are a "tinkerer" who prefers solving real-world problems over theoretical research. • Strong Python skills and experience with data manipulation libraries (Pandas/NumPy) to perform custom analysis outside of standard tooling. • Ability to explain complex diagnostic findings clearly and persuasively to both technical peers and non-domain stakeholders. • A proven ability to look at a problem without preconceived notions and figure out solutions either independently or via team collaboration. • Demonstrated commitment to Transparency, Collaboration, and Ownership—especially in environments where reliability and learning from failure are paramount.

🏖️ Vorteile

• Fast-paced, team-oriented environment where your work directly shapes the company’s direction. • We are a 100% remote company. • Competitive compensation & meaningful equity. • Outsized responsibilities & professional development. • Training is foundational; functional, customer immersion, and development training. • Medical, dental, and vision insurance (exact benefits vary by region). • Unlimited paid time off, with a required minimum of 20 days per year. • Paid parental leave (exact benefits vary by region). • Flexible stipends to support your workspace, well-being, and continued professional development. • Company MacBook.

Jetzt Bewerben

Ähnliche Jobs

DevOps Engineer – Secret

🕒 vor 11 Tagen

Xcelerate Solutions

1001 - 5000

Senior DevOps Engineer automating, optimizing delivery pipeline for defense systems. Leveraging CI/CD, IaC, and cloud technologies to enhance operational efficiency.

🇺🇸 Vereinigte Staaten – Remote

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

⛑ DevOps- und Site Reliability Engineer (SRE)

🗣️🇺🇸🇬🇧 Englisch erforderlich

Ansible

AWS

Azure

Chef

Cloud

Docker

Microservices

OpenShift

OpenStack

Puppet

SaltStack

TFS

DevOps Engineer

🕒 vor 11 Tagen

Torq

51 - 200

🤖 Künstliche Intelligenz

🔒 Cybersecurity

DevOps Engineer managing AI-native autonomous SecOps platform processes and collaborating with global teams. Identifying efficiencies and delivering automation in a fast-paced environment.

🇺🇸 Vereinigte Staaten – Remote

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

⛑ DevOps- und Site Reliability Engineer (SRE)

🗣️🇺🇸🇬🇧 Englisch erforderlich

AWS

Cloud

Docker

Google Cloud Platform

Grafana

Jenkins

Kubernetes

Microservices

Prometheus

Python

Terraform

DevOps Engineer

🕒 vor 11 Tagen

Torq

51 - 200

🤖 Künstliche Intelligenz

🔒 Cybersecurity

DevOps Engineer managing production environments and collaborating with global teams at a fast-growing cybersecurity company. Championing automation and optimizing reliability in a cutting-edge tech environment.

🇺🇸 Vereinigte Staaten – Remote

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

⛑ DevOps- und Site Reliability Engineer (SRE)

🗣️🇺🇸🇬🇧 Englisch erforderlich

AWS

Cloud

Docker

Google Cloud Platform

Grafana

Jenkins

Kubernetes

Microservices

Prometheus

Python

Terraform

DevOps Engineer

🕒 vor 11 Tagen

Torq

51 - 200

🤖 Künstliche Intelligenz

🔒 Cybersecurity

DevOps Engineer automating and optimizing software development processes for a cybersecurity firm. Collaborating with global teams to enhance production environments and streamline workflows.

🇺🇸 Vereinigte Staaten – Remote

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

⛑ DevOps- und Site Reliability Engineer (SRE)

🗣️🇺🇸🇬🇧 Englisch erforderlich

AWS

Docker

Google Cloud Platform

Grafana

Jenkins

Kubernetes

Microservices

Prometheus

Python

Terraform

DevOps Engineer

🕒 vor 11 Tagen

Torq

51 - 200

🤖 Künstliche Intelligenz

🔒 Cybersecurity

DevOps Engineer responsible for automation and optimization at cybersecurity startup. Collaborating globally and empowering development teams in a fast-moving environment.

🇺🇸 Vereinigte Staaten – Remote

⏰ Vollzeit

🟡 Mittelstufe

🟠 Senior

⛑ DevOps- und Site Reliability Engineer (SRE)

🗣️🇺🇸🇬🇧 Englisch erforderlich

AWS

Cloud

Docker

Google Cloud Platform

Grafana

Jenkins

Kubernetes

Microservices

Prometheus

Python

Terraform