Site Reliability Engineering Manager

Non-profit • Education • Media

Wikimedia Foundation is a nonprofit charitable organization dedicated to the growth, development, and distribution of free, multilingual content. It provides the essential infrastructure for free knowledge, including hosting Wikipedia, the free online encyclopedia that is created, edited, and verified by a global community of volunteers. Supported primarily through donations, Wikimedia Foundation promotes collaborative projects that aim to share knowledge reflecting human diversity and strives to protect everyone's right to access free and open knowledge.

501 - 1000 employees

Founded 2003

🤝 Non-profit

📚 Education

📱 Media

💰 $2.5M Grant on 2019-09

Site Reliability Engineering Manager

July 10

🇺🇸 United States – Remote

💵 US$132.4k - US$208.4k / year

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Ansible

Cloud

Docker

Kubernetes

Linux

Open Source

Terraform

Apply Now

Wikimedia Foundation

Non-profit • Education • Media

501 - 1000 employees

Founded 2003

🤝 Non-profit

📚 Education

📱 Media

💰 $2.5M Grant on 2019-09

📋 Description

•Managing one to two globally distributed teams within Wikimedia’s Site Reliability Engineering organization •Providing guidance, mentorship, and support to ensure the team's effectiveness and growth •Working with team members to set individual performance goals, and supporting them in meeting and evolving their goals and career path •Recruiting, hiring, and helping onboard new team members •Triaging incoming workload, maintaining focus on priorities, and setting realistic expectations for both peers and team members •Coordinating and communicating with other members of the Wikimedia product & engineering teams on relevant projects, executing complex projects and contributing to the organizational strategy •Continuously developing the roadmap of the team in alignment with other SRE and Product & Technology teams, and helping to draft and execute the team’s annual and quarterly plans •Project managing new and existing initiatives •Leading the definition, refinement, and execution of the processes through which the team manages and performs work •Leading incident response, diagnosis, and follow-up on system alerts and outages across Wikimedia’s production infrastructure •Be part of 24/7 on-call rotation to handle escalations and provide support for teams to resolve issues •Facilitating the definition and establishment of Service Level Indicators and Objectives with service owners and stakeholders

🎯 Requirements

•Prior experience managing teams •Prior hands-on experience with software or reliability engineering (within the last 3 years preferred) •Ability to analyze complex systems, troubleshoot issues, and devise effective solutions under pressure •Proficiency in project management methodologies to effectively plan, execute, and track new and existing initiatives •Strong understanding of cloud computing, networking, Linux systems administration, containerization (e.g., Docker, Kubernetes), and infrastructure as code (e.g., Terraform, Ansible) to be able to provide technical support to the team •Aptitude for automation and streamlining of tasks •Communicate effectively in both spoken and written English •Ability to work independently, as an effective part of a globally distributed team •Ability to travel several times a year for occasional in-person meetings •B.S. or M.S. in Computer Science or the equivalent in related work experience

🏖️ Benefits

•U.S. Benefits & Perks

Apply Now

Similar Jobs

Site Reliability Engineer

July 9

Tekmetric

51 - 200

Join Tekmetric as a Site Reliability Engineer to manage reliable cloud infrastructure and enhance system performance.

🇺🇸 United States – Remote

💰 Venture Round on 2022-03

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

AWS

Cloud

Docker

Google Cloud Platform

Grafana

Java

JavaScript

Kubernetes

Prometheus

Python

Terraform

DevOps Engineer

July 8

Intermedia Cloud Communications

1001 - 5000

🤝 B2B

🏢 Enterprise

☁️ SaaS

Join Intermedia as a DevOps Engineer to deploy and maintain application infrastructure and collaborate with development teams.

🇺🇸 United States – Remote

💰 Venture Round on 2017-02

⏰ Full Time

🟡 Mid-level

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Ansible

AWS

Cloud

Docker

ElasticSearch

ETL

Jenkins

Kubernetes

Linux

MySQL

Python

RabbitMQ

Redis

Senior Site Reliability Engineer

July 6

Overstory

11 - 50

🤖 Artificial Intelligence

⚡ Energy

☁️ SaaS

Senior Site Reliability Engineer managing GCP infrastructure and DevOps practices. Help reduce wildfire risks using advanced technology.

🇺🇸 United States – Remote

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

Cloud

Google Cloud Platform

Kubernetes

Unix

DevOps Engineer - Mid-to-Senior Level

July 4

Resonance Companies

51 - 200

👗 Fashion

🛍️ eCommerce

🤖 Artificial Intelligence

Join Resonance as a DevOps Engineer to build and maintain an AI-driven platform for fashion.

🇺🇸 United States – Remote

💰 Venture Round on 2020-10

⏰ Full Time

🟠 Senior

⛑ DevOps & Site Reliability Engineer (SRE)

🦅 H1B Visa Sponsor

Airflow

AWS

Cloud

Distributed Systems

Docker

EC2

Grafana

GraphQL

Jenkins

Kafka

Kubernetes

Microservices

NoSQL

Prometheus

Terraform

Senior Site Reliability Engineer

July 3

MetaRouter

11 - 50

☁️ SaaS

🏢 Enterprise