Site Reliability Engineer

Job not on LinkedIn

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of MKS2 Technologies

MKS2 Technologies

201 - 500 employees

Founded 2008

🤝 B2B

🔒 Cybersecurity

B2B • Cybersecurity • IT Services

MKS2 Technologies is a technology business established in 2008 that provides services to the Federal Government and commercial clients. The company focuses on defining missions, employing domain knowledge, formulating strategies, and implementing solutions. Built on values instilled by its founder, a former combat veteran, MKS2 emphasizes long-term relationships and effective communication to ensure project success. They are known for their expertise in IT enterprise solutions and cybersecurity initiatives, serving various clients across the science and technology sectors.

📋 Description

• Utilize your skills in enterprise-level triage and incident resolution while gaining experience in VA system infrastructure. • Use modern system monitoring tools to improve VA enterprise reliability and improve the quality of services provided to veterans. • Work with system and application owners to obtain existing design and functionality, leverage comprehension of workflow systems and applications processes within multiple system environments and work across technology and development teams to diagnose outages and recommend changes to increase reliability. • Use your hardware and software experience to help strengthen the systems the VA relies on. Your primary focus will be investigation, working with event management, application owners, DevOps teams, and system and network administrators to examine issues across enterprise applications and technology stacks. • Partner with system and application owners to understand their platform designs and how they operate across different environments. This insight will help you diagnose outages, trace workflow issues, and recommend changes that enhance stability. • Collaborate with developers and identity and access teams when deeper technical investigations are needed. • You’ll gain hands‑on experience with enterprise‑level triage and incident analysis, which will deepen your understanding of the VA’s infrastructure. Tools like SolarWinds, Dynatrace, and Splunk will be part of your daily workflow, giving you the visibility needed to identify reliability concerns and support improvements to the services delivered to veterans.

🎯 Requirements

• Deep expertise (3+ years) in two or more of the following tools used for troubleshooting application logging in an enterprise environment (Dynatrace, Splunk, SolarWinds, ServiceNow Operator Workspace) • Extensive experience in one or more Technology Areas (Network, Windows, Desktop, Unix/Linux, AWS or Azure Cloud, WebSphere Middleware, Java/JS Development, Microsoft or Oracle Database) • 8+ years of experience working with key indicators for IT system operability, reliability, application performance, and code quality • 8+ years of experience deploying, maintaining, and troubleshooting complex applications at an enterprise scale while working with cross-functional teams • 1+ years of experience in service virtualization, AWS or Azure Cloud technologies, and SaaS and PaaS implementation • Experience with using Microsoft Office, including Word, Excel, and PowerPoint • 2+ years independently leading a team to solve difficult technical challenges • HS diploma or GED and 20+ years of relevant professional experience or MA or MS degree in computer science, electronics engineering, or other engineering or technical discipline with 10+ years of relevant professional experience

🏖️ Benefits

• Health insurance • Retirement plans • Paid time off • Flexible work arrangements • Professional development

Apply Now

Similar Jobs

🔥 19 minutes ago

VAST Data

501 - 1000

DevOps Engineer developing tools to enhance efficiency for the Sales Engineering team at an AI infrastructure company. Responsible for managing AWS services and backend applications.

🔥 1 hour ago

Mozilla

501 - 1000

👥 B2C

🔒 Cybersecurity

Senior Site Reliability Engineer for Thunderbird, establishing infrastructure and operational systems in a fully remote role. Responsible for CI/CD systems, production incidents, and continuous improvement.

🔥 1 hour ago

Mozilla

501 - 1000

👥 B2C

🔒 Cybersecurity

Senior Site Reliability Engineer establishing infrastructure and operational systems for Thunderbird's open-source email applications. Focusing on reliability improvements and collaboration with distributed teams.

🔥 3 hours ago

A:20Labs

1 - 10

Lead DevOps Engineer overseeing cloud infrastructure and team management for a digital health company. Responsible for DevOps strategy alignment with business objectives across AWS and Azure.

🔥 8 hours ago

PhoenixTeam

51 - 200

💳 Fintech

🏠 Real Estate

DevOps Release Engineer coordinating deployments and release management for VA's Loan Guaranty Service using Salesforce and AWS. Supporting CI/CD pipelines and ensuring compliance in a cloud-based environment.