DevOps Reliability Engineer

Job not on LinkedIn

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Advanced Solutions International, Inc.

Advanced Solutions International, Inc.

201 - 500 employees

Founded 1991

🤝 B2B

🤝 Non-profit

💰 Venture Round on 2022-01

B2B • Non-profit • Software

Advanced Solutions International, Inc. is a company dedicated to providing innovative solutions tailored for non-profit organizations. Their flagship product, iMIS, is recognized as a top association management software, facilitating effective member engagement and organizational performance. ASI also offers various platforms like Clowder and TopClass, aimed at enhancing mobile engagement and learning management for associations.

📋 Description

• Monitor and improve the health, availability, performance, and cost efficiency of Azure-based production systems. • Use application, database, and infrastructure telemetry to identify performance issues, bottlenecks, and reliability risks. • Tune Azure services and platform configurations to maximize performance, resilience, and resource efficiency. • Partner with engineering teams to recommend and implement practical, data-driven improvements to reliability, scalability, and operational effectiveness. • Create and maintain operational documentation, runbooks, and troubleshooting guides to support consistent incident response and ongoing operations. • Support Tech Support and Sustained Engineering by executing approved SQL queries and completing database backups and restores for troubleshooting purposes. • Analyze how partner integrations and customer usage patterns impact system performance and cloud spend. • Investigate complex production issues, perform root cause analysis, and drive resolution of reliability and performance problems. • Contribute to continuous improvement across deployment processes, system stability, and operational readiness. • Perform other job-related duties and responsibilities as assigned.

🎯 Requirements

• Bachelors degree in Computer Science, Information Technology or related degree or relevant experience. • 8+ years of experience in DevOps, Site Reliability Engineering, Cloud Engineering, or similar roles. • Strong hands-on experience with Microsoft Azure, especially: Azure SQL, Azure Functions, Azure App Services, and Azure Containers (AKS, Container Apps, or similar). • Ability to read and interpret telemetry, logs, metrics, and resource usage data and explain what's wrong and how to fix it. • Experience working with production systems that require high availability and reliability. • Comfort owning work end-to-end, from identifying issues to executing improvements. • Experience adjusting pipelines, hosting configurations, and deployment processes. • Solid understanding of cloud cost drivers and usage optimization. • Strong problem-solving skills and the ability to work collaboratively across engineering and support team. • Ability to read and interpret application code to support troubleshooting, root cause analysis, and identification of performance improvement opportunities.

🏖️ Benefits

• Wellness Benefits • Opportunities for Professional Growth and Development • Flexible Remote Work • Volunteer Time Off • Study Leave • Employee Assistance Program

Apply Now

Similar Jobs

🔥 11 hours ago

CrowdStrike

5001 - 10000

🔒 Cybersecurity

☁️ SaaS

🤖 Artificial Intelligence

Database Reliability Engineer managing and optimizing cloud-based databases at CrowdStrike. Collaborating with engineering teams to automate and secure data management processes.

AWS

Cassandra

Chef

Cloud

ElasticSearch

Google Cloud Platform

Kafka

Kubernetes

Linux

MySQL

Postgres

Python

Zookeeper

🕒 6 days ago

Omilia - Conversational Intelligence

201 - 500

🤖 Artificial Intelligence

🛍️ eCommerce

Senior Site Reliability Engineer maintaining production clusters and developing observability solutions. Collaborate with teams to ensure platform reliability and performance using automation and monitoring tools.

Ansible

AWS

Cloud

Docker

Grafana

Kubernetes

Linux

MySQL

NoSQL

Postgres

Prometheus

Python

RDBMS

Redis

TCP/IP

Terraform

VoIP

Go

🕒 June 1

Red Hat

10,000+ employees

🏢 Enterprise

Customer Site Reliability Engineer managing critical services and driving reliability and customer satisfaction at Red Hat. Engaging with cross-functional teams and enhancing system resilience.

🗣️🇯🇵 Japanese Required

Ansible

AWS

Azure

Cloud

Distributed Systems

Google Cloud Platform

Kubernetes

Linux

OpenShift

Prometheus

TCP/IP

Terraform

Go

🕒 May 8

Megaport

201 - 500

📡 Telecommunications

Senior Platform Engineer at Megaport, focusing on DevOps and SRE practices across their systems. Responsible for reliability and stakeholder engagement in a collaborative tech environment.

AWS

Cassandra

Cloud

Kubernetes

Linux

Postgres

Python

Terraform

Go

🕒 April 28

Sigma Prime

11 - 50

🌐 Web 3

₿ Crypto

🔒 Cybersecurity

Devops Engineer building decentralized network infrastructure with Sigma Prime. Assist developers and create testnets while maintaining production instances of Ethereum software.

Ansible

DNS

Firewalls

Kubernetes

Linux

Terraform