Cluster & Systems Capacity Engineer

Job not on LinkedIn

🔥 5 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Backblaze

Backblaze

201 - 500 employees

Founded 2007

🛍️ eCommerce

🏢 Enterprise

💰 $5M Series A on 2012-07

Cloud Storage • eCommerce • Enterprise

Backblaze is a cloud storage company that provides scalable and secure data backup solutions for both businesses and individuals. Their B2 Cloud Storage service offers S3 compatible object storage, allowing users to easily protect and manage their data with transparent pricing. Backblaze specializes in automatic and unlimited backup services for computer systems, ensuring data protection and recovery options for users, while also supporting integration with applications for enhanced functionality.

📋 Description

• Develop and maintain short, medium, and long-term capacity demand and hardware deployment forecasts across storage, compute, and network domains within the platform • Build predictive models that translate business demand signals into infrastructure requirements using historical utilization, growth trends, product sales plans, hardware lifecycle roadmaps, and other key business inputs • Partner with Infrastructure, Production, and Network Engineering teams to align capacity plans with system design and scaling initiatives • Develop and automate forecasting pipelines, simulation calculators and tools, and capacity dashboards to improve data quality, reduce manual analysis, and provide stakeholders clear visibility into platform usage and cluster health metrics • Monitor and analyze cluster and system-level utilization and performance across CPU, memory, IOPS, and network resources • Adjust deployment plans and recommended configurations in real-time to maintain adequate headroom and system stability in support of delivering a world-class customer experience • Partner with service and platform owners to develop headroom and live buffer policies, optimize hardware BoMs, leverage virtualized orchestration, and reduce product cost • Work in lockstep with Operations and Finance peers to align capacity plans and hardware requirements with capital budgets, cost targets, and financial outcomes • Support strategic optimization initiatives across infrastructure investments, engineering development, and operations processes, contributing to long-term infrastructure strategy and capital planning • Lead efforts to evaluate, procure, and provision requests for new or additional hardware, working with Systems and Network Engineering, SRE, NOC, and Data Center Operations teams to identify and deliver optimal solutions • Maintain alignment with Product and Sales to support customer onboarding, growth, and demand variability • Communicate complex capacity and infrastructure insights clearly to technical and non-technical stakeholders

🎯 Requirements

• Bachelor’s degree in Computer Science, Engineering, Mathematics, Data Science, Information Systems, Statistics or a related, technical field (or equivalent experience). • 3-6+ years of experience in Site Reliability Engineering, Infrastructure Capacity Planning, Systems/Infrastructure Engineering, Production Engineering, Data Center Operations or similar Cloud Operations role • Familiarity and experience working with Cloud Storage infrastructure, particularly highly-available, large-scale distributed systems supporting large amounts of data with high throughput and complex performance requirements • Background in capacity modeling, performance analysis, scenario modeling, and/or infrastructure cost optimization, with an ability to quantify ideas within financial frameworks and forecasts. • Proficiency in database and data analysis tools (preferably Snowflake, Metabase, Grafana, Python, SQL, Prometheus, Victoria Metrics, and Excel/Google Sheets) • Demonstrated deep, creative, and logical thinking complimented by a strong data analysis skillset • Excellent communication and documentation skills, with the ability to share knowledge and explain concepts accurately and concisely • Desire to work on a highly-autonomous team that cares deeply about quality, cost, and the customer experience

🏖️ Benefits

• Healthcare for family, including dental and vision • Competitive compensation and 401K • RSU grants for full-time employees • ESPP program • Flexible vacation policy • Maternity & paternity leave • MacBook Pro to use for work, plus a generous stipend to personalize your workstation • Childcare bonus (human children only) • Fertility treatment and support • Learning & development program • Commuter benefits • Culture that supports a healthy work-life balance

Apply Now

Similar Jobs

🔥 15 minutes ago

Carle Health

10,000+ employees

⚕️ Healthcare Insurance

📚 Education

Senior Finance Systems Analyst managing finance/accounting applications for Carle Health. Responsibilities include governance, analysis, and collaboration with stakeholders across the enterprise.

🔥 15 minutes ago

Gartner

10,000+ employees

🏢 Enterprise

Senior Director Analyst creating innovative insights in L&D and skills management technologies for Gartner clients. Focusing on HR technology strategies and AI impact.

🔥 31 minutes ago

CACI International Inc

10,000+ employees

🔒 Cybersecurity

Lead Systems Engineer implementing and overseeing SAP ERP solutions for USTRANSCOM. Driving technical design and system architecture while ensuring compliance with DoD standards.

Cloud

ERP

🔥 52 minutes ago

Peraton

10,000+ employees

🔒 Cybersecurity

🏛️ Government

Voice Communications Switch Systems Engineer at Peraton contributing to the modernization of the National Airspace System. Supporting installation and transition of Voice Communication Switch systems while ensuring system cutovers from legacy platforms.

🔥 1 hour ago

Deckers Brands

1001 - 5000

👥 B2C

👗 Fashion

🛒 Retail

Lead System Engineer designing and delivering technical solutions in product development for footwear and apparel brands. Collaborating with teams to achieve system and process needs in a global organization.

Java

JavaScript

SQL

Subversion