Senior Site Reliability Engineer

November 18

Apply Now
Logo of ScienceLogic

ScienceLogic

Artificial Intelligence • Enterprise • SaaS

ScienceLogic is a leading provider of AI-powered IT operations management solutions designed to improve IT efficiency and business visibility. Their AI platform, SL1, offers comprehensive solutions for monitoring, assuring, automating, and providing insights into hybrid IT infrastructure. ScienceLogic aims to empower businesses by enabling autonomic IT and workflow automation, significantly reducing mean time to repair (MTTR) and driving digital transformation. By consolidating IT tools and providing automated root cause analysis, ScienceLogic enhances problem-solving capabilities and ensures real-time management of IT environments at scale. Their platform is trusted by top-tier organizations across multiple industries, including government and public sector, banking, and financial services.

501 - 1000 employees

Founded 2010

🤖 Artificial Intelligence

🏢 Enterprise

☁️ SaaS

💰 $21.2M Venture Round on 2022-10

📋 Description

• Lead design reviews and buildout of secure systems for delivering new Artificial Intelligence Product in SaaS, aiming for 99.99% uptime. • Design, automate, test, and monitor the use of cloud native technologies as a foundation for a service platform. • Spend 75% of your time on forward looking priorities designing and building SaaS systems while remaining on supporting the Operations and Maintenance of the current SaaS infrastructure. • Investigate and resolve customer and operational issues with the mentality of fixing and not just mitigating issues. • Identify and automate measurement of operations SLAs and SLOs • Triage incident response, document SOPs, Runbooks, and train NOC team members • Writing automation can be easily supported and extended by others. • Collaborate across the organization to design, build and operationalize SaaS services conforming to various security standards like FedRAMP, SOC2, ISO etc. • Participate in the on-call rotation as assigned. • Take full responsibility for the availability and performance of the platform. • Work on special projects as assigned.

🎯 Requirements

• 8-12 years of site reliability engineering, cloud operations or equivalent experience • Proven experience in managing complex Kubernetes environments in multiple Production systems. • Working with Cloud Automation tools like CloudFormation, Terraform, aws-cli/CDK, Cloudformation • Scripting languages like Python, Bash, Perl etc. • Exposure to Linux administration skills. • Proven track record of operating production SaaS environments within security standards like FedRAMP, SOC2, ISO, PCI. • Skilled at problem solving, algorithms, and data structures conforming to the modern SaaS security requirements. • Building tools and scripting frameworks from scratch. • Familiarity with basic networking, security and cloud engineering concepts • Highly collaborative with effective written and verbal communication skills • Ability to work against tight deadlines and occasionally after-hours, part of on-call scheduling. • Occasionally work during off-hours and participate in weekly on-call schedule. • Bachelors or Master's degree in Computer Science, Information Systems or similar field.

🏖️ Benefits

• A remote flexible workplace. • Comprehensive medical, dental and vision plans. • 401(k) plan with employer match. • Flexible Paid Time Off (FTO) so that you can take the time that you need to re-energize. • Volunteer Time Off (VTO) - take two days off per calendar year to volunteer with your preferred charitable organization. • 5-year Service Milestone Sabbatical. • Paid parental leave. • Generous employee referral bonus program. • Pet insurance. • HQ Office centrally located in Reston Town Center featuring a well-stocked kitchen with rotating snacks and beverages, and catered lunch on Thursdays. • Regular virtual company-wide events, including cooking classes, yoga, meditation and more. • The opportunity to learn and develop from some of the best and brightest minds in the industry!

Apply Now

Similar Jobs

November 18

Senior DevOps Engineer scaling cloud-native infrastructure at Sureify's innovative life insurance platform. Engage in building and maintaining AWS infrastructure, improving CI/CD pipelines, and mentoring teams.

AWS

Cloud

DNS

Docker

EC2

Grafana

Kubernetes

Prometheus

Python

Terraform

November 18

Lead DevOps Engineer at Updater enhancing deployment velocity and system reliability through collaboration and tooling. Focus on improving developer experience and driving key initiatives in Platform Engineering.

AWS

Flux

Kubernetes

Prometheus

Terraform

November 17

Senior Site Reliability Engineer responsible for optimizing and automating production applications and systems at a mobile security company. Lead engineering projects in a collaborative and supportive environment.

Cloud

Docker

Java

Kubernetes

Linux

Python

Unix

November 15

Systems Engineer providing Cloud DevOps technical support for the Federal Aviation Administration. Responsible for maintaining cloud infrastructure and troubleshooting technical issues.

AWS

Azure

Cloud

November 15

Senior DevOps Engineer ensuring reliability, automation, and observability of open data pipelines at Overture Maps Foundation. Managing large-scale geospatial data across cloud environments and CI/CD workflows.

Airflow

AWS

Azure

Cloud

Python

Scala

Spark

Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com