Network Reliability Engineer

Job not on LinkedIn

đŸ”„ 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of MARGO

MARGO

201 - 500 employees

Founded 2005

đŸ€– Artificial Intelligence

💳 Fintech

Artificial Intelligence ‱ Fintech ‱ Digital Transformation

MARGO is a digital consulting firm specializing in high-complexity IT missions across various sectors. As data, AI, and digital experts, MARGO partners with ambitious clients to drive progress through digital transformation. With a focus on industries such as finance, insurance, energy, and technology, MARGO provides expertise in artificial intelligence, cloud transformation, data architecture, and software engineering.

📋 Description

‱ Build a large AI infrastructure with monitoring, diagnosis, and remediation of production incidents ‱ Troubleshoot high-impact production issues in collaboration with other engineering teams ‱ Participate in an on-call rotation to handle incidents and ensure service continuity ‱ Implement and maintain observability solutions to monitor AI infrastructure and application health ‱ Contribute to AI infrastructure lifecycle management across different environments and countries ‱ Promote and apply best practices in terms of stability, resiliency, scalability, and security ‱ Maintain clear technical documentation for tools and procedures ‱ Contribute to system and tool evolution based on production feedback ‱ Collaborate closely with development teams to ensure infrastructure readiness ‱ Participate in team rituals and knowledge-sharing initiatives

🎯 Requirements

‱ Experience with Go or Python ‱ Strong scripting skills (Bash, Python) ‱ Hands-on experience with Linux systems (Ubuntu/Debian) ‱ Preferred hands-on experience with GPU & HPC infrastructure ‱ Knowledge of networking (VLAN/LAN, TCP/IP, DNS, BGP, load-balancing, IPv6, etc.) ‱ Familiarity with monitoring and logging tools (Prometheus, Grafana, Elastic, etc.) ‱ Comfortable with Infrastructure-as-Code (Ansible, Salt, AWX, etc.) ‱ Experience managing relational databases (MariaDB) ‱ Understanding of CI/CD pipelines (GitLab) ‱ Comfortable with English (written and spoken)

Apply Now

Similar Jobs

đŸ”„ 21 hours ago

Netguru

501 - 1000

☁ SaaS

🏱 Enterprise

đŸ€ B2B

Senior DevOps Engineer at Netguru managing diverse projects remotely. Collaborating as part of an experienced team with flexibility over hours and tasks.

Grafana

Kafka

Kubernetes

Postgres

đŸ”„ 21 hours ago

Netguru

501 - 1000

☁ SaaS

🏱 Enterprise

đŸ€ B2B

Regular DevOps Engineer working remotely on projects for various industries. Collaborating with experienced developers at Netguru to modernize digital commerce solutions.

Grafana

Kafka

Kubernetes

Postgres

🕒 4 days ago

GoReel

51 - 200

🎼 Gaming

đŸŽČ Gambling

SRE Lead responsible for designing, implementing, and maintaining cloud infrastructure in the iGaming industry. Collaborating with development teams to ensure system reliability and streamline deployment processes.

AWS

Cloud

Docker

EC2

ElasticSearch

Grafana

Jenkins

Kubernetes

Prometheus

Python

🕒 5 days ago

SOFTSWISS

1001 - 5000

🎼 Gaming

Senior System Engineer focused on designing and securing distributed environments for SOFTSWISS Game Aggregator. Bridging software development and core infrastructure using cloud-native best practices.

đŸ—ŁïžđŸ‡·đŸ‡ș Russian Required

Cloud

Distributed Systems

Kubernetes

Linux

Python

Go

🕒 5 days ago

Software Mind

1001 - 5000

đŸ€– Artificial Intelligence

☁ SaaS

📡 Telecommunications

Join Software Mind as a DevOps Engineer focusing on Azure solutions and CI/CD automation for global clients. Collaborate on designing and implementing Azure Landing Zones and CI/CD pipelines.

Azure

Docker

Kubernetes

Python

Terraform