Staff Production Engineer

🔥 0 minutes ago

Apply Now
Find Similar Remote Jobs

📊 Check your resume score for this job

Improve your chances of getting an interview by checking your resume score before you apply.

Logo of Canva

Canva

1001 - 5000 employees

Founded 2013

☁️ SaaS

📱 Media

📚 Education

💰 $200M Venture Round on 2021-09

SaaS • Media • Education

Canva is a versatile online design platform that empowers users to create a wide range of professional designs with ease. From social media posts and presentations to business cards and posters, Canva provides thousands of templates and design tools to help users bring their creative ideas to life. The platform also offers a suite of AI-powered features to enhance creativity and productivity, including tools like Magic Write for copy generation and Magic Edit for photo transformations. Canva caters to individuals, teams, and enterprises, making it an ideal solution for collaborative design and workflow management. It is also committed to sustainability and social impact, offering free educational and nonprofit access to its premium features.

📋 Description

• Owning an engagement area: Taking long-term accountability for one of Canva's highest-risk technical domains • Writing production software: The work is code, not process • Instrumenting, refactoring, rebuilding the pieces that cause problems at scale • You're a software engineer first; the reliability outcome at scale is what you're optimising for • Opportunity to pair, mentor and learn from fellow production engineers • Striving for fewer incidents, faster recovery, lower severity, latency that bends in the right direction • Taking pride in moving needle metrics, that positively impacts the quality of the customer experience

🎯 Requirements

• Owned reliability work within large-scale distributed systems • Previously worked as an engineer embedded in or partnering closely with a product or feature team, not siloed in a platform org that throws tools over the fence • You've built real things in Java, Go, Rust, C++, or a comparable systems language at production scale; commercial depth, not academic familiarity • Navigated sharding, replication, failure modes, consistency tradeoffs in real systems • Ability to parachute into an unfamiliar codebase, orient quickly, find where the problem actually lives, and fix it • Proven to have made things better in systems through wisdom and trust • You know the network stack and what traffic looks like a scale • Enough kernel-level understanding to reason about what's actually happening when a system misbehaves process scheduling, memory, I/O, network stack • Consistent hashing, leader election, consensus, backpressure, circuit breakers • You've instrumented systems for real, built the tracing, the dashboards, the alerting that actually tells you what's wrong • You've profiled JVM applications or systems-level processes, found the thing nobody was looking at, and fixed it in a way that lasted • AWS at meaningful depth, so you understand how they behave under load and at the edges • You've been on-call in a serious production environment and have opinions about what good incident management actually looks like

🏖️ Benefits

• Equity packages — we want our success to be yours too • Inclusive parental leave policy that supports all parents & carers • An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more • Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally

Apply Now