Search Remote Jobs

Senior Site Reliability Engineer

đŸ”„ 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Autodesk

Autodesk

10,000+ employees

Founded 1982

đŸ“± Media

Architecture ‱ Engineering ‱ Media

Autodesk is a global leader in software for designers, engineers, builders, and creators. The company provides a comprehensive suite of design and engineering applications including popular products like AutoCAD, Revit, and 3ds Max. Through its Design and Make Platform, Autodesk empowers professionals across various industries to design, visualize, and manage projects efficiently, facilitating innovation and sustainability in architecture, engineering, construction, and manufacturing.

📋 Description

‱ Serve as a primary owner for the reliability, availability, performance, operability, and capacity of one or more production services ‱ Deploy, operate, maintain, and continuously improve production services running in Autodesk GovCloud environments ‱ Partner with engineering teams to ensure services are designed with reliability, scalability, security, and operability in mind ‱ Define and operate reliability practices such as SLOs/SLIs, error budgets, production readiness reviews, service reviews, and operational health reviews ‱ Build automation to improve deployment safety, operational efficiency, incident response, and service recovery ‱ Design, develop, and maintain software, automation, and tooling that improve the reliability, scalability, and efficiency of production systems ‱ Implement and improve monitoring, alerting, logging, tracing, and observability capabilities across supported services ‱ Lead and participate in incident response, troubleshooting, and post-incident reviews focused on learning and continuous improvement ‱ Develop and maintain operational documentation, runbooks, and recovery procedures ‱ Scale and enhance resilience testing and Gameday practices to validate system behavior, recovery capabilities, and operational readiness ‱ Continuously identify and eliminate operational toil through software engineering, automation, and process improvement ‱ Ensure supported services remain compliant with Autodesk security, privacy, and regulatory requirements, including FedRAMP and related controls where applicable ‱ Participate in a 24x7 on-call rotation for production services

🎯 Requirements

‱ B.S. or higher in Computer Science, Engineering, or a related technical discipline, or equivalent practical experience ‱ 7+ years of experience in Site Reliability Engineering, Software Engineering, Platform Engineering, Cloud Infrastructure, or Production Operations ‱ Experience operating and supporting customer-facing production services in large-scale cloud environments ‱ Strong understanding of reliability engineering principles, including SLOs/SLIs, observability, incident management, capacity planning, production readiness, and automation ‱ Experience with AWS, Azure, or other public cloud platforms ‱ Experience developing automation using languages such as Python, Go, Java, PowerShell, Bash, or similar ‱ Experience with Infrastructure as Code, CI/CD pipelines, deployment automation, and modern cloud operations practices ‱ Understanding of security, compliance, and operational risk management in production environments ‱ Strong written and verbal communication skills.

đŸ–ïž Benefits

‱ Health and financial benefits ‱ Time away and everyday wellness

Apply Now

Similar Jobs

đŸ”„ 36 minutes ago

Coupa Software

1001 - 5000

☁ SaaS

💾 Finance

đŸ›ïž eCommerce

Senior Database Reliability Engineer overseeing Cloud based SQL Server infrastructures at Coupa. Leading database architecture and ensuring reliable, high-performance data solutions.

đŸ”„ 4 hours ago

Zigsaw

11 - 50

Site Reliability Engineer enhancing AWS-based platform reliability at Pinterest and scaling Kubernetes workloads. Operating and improving cloud-native infrastructure with a focus on automation and resilience.

đŸ”„ 4 hours ago

YipitData

201 - 500

💾 Finance

🏱 Enterprise

DevSecOps Lead managing secure software development lifecycle at YipitData. Collaborating across departments to strengthen security practices within engineering operations.

đŸ”„ 5 hours ago

YipitData

201 - 500

💾 Finance

🏱 Enterprise

DevSecOps Lead building secure software development lifecycle and vulnerability management at YipitData. Leading cross-functional collaboration to implement security standards across software development.

đŸ”„ 16 hours ago

Guidehouse

10,000+ employees

Site Reliability Engineer collaborating with teams to establish SRE practices and participate in system design reviews at Guidehouse. Focused on AWS cloud infrastructure and promoting automation.