SRE bridge the gap: Feature development to Core API / 機能開発チームとコアAPIチームの架け橋としてのSRE

638 Views

May 18, 22

スライド概要

Shopifyという世界規模で成長している会社で、SRE、Production Engineeringのプラクティスがどのように育ってきたか、二つの観点でお話しします。

一つ目は会社全体の歴史として。どういったタイミングでProduction Engineeringモデルが導入され、その中で現在私が所属するResiliencyというSREチームが何を担っているか。

二つ目は、一ソフトウェアエンジニアのキャリア展開として。会社が成長する際に、いかに「分野を超えた積極的なポジション替え」が組織と個人、双方の成長に役立つのかをお話しします。

会社が大きくなるにつれて役割分担と新陳代謝が進み、元々一つのチームがもっていた「システム全体の知識と知恵」が細分化されていきます。SREがそれに対するアンチテーゼだとすると、一個人はどのように振る舞うことでこれを上手に活かし、レベルアップできるのでしょうか。

新参者のSREですが、みなさんと一緒に考えてみたいと思います。

profile-image

https://note.com/kenzan100/m/m4c4484e9eadf

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

関連スライド

各ページのテキスト
1.

SRE bridge the gap: Feature development to Core API 機能開発チームとコアAPIチームの架け橋としてのSRE Yuta Miyama, Apr 15th 2022

2.

Who I am Yuta Miyama Student - Entrepreneur Maker - Self-taught programmer Now - Around-the-world migrants

3.

Around the world migrants 2010 - Started programming career in Japan 2016 - Moved to Berlin 2020 - Moved to Toronto 2022 - Back to Japan Photo by Amy Humphries on Unsplash

4.

What I want to talk about Introduce you to Shopify’s production engineering practice Shopifyのプロダクションエンジニア組織の紹介 Encourage the cross discipline moves between feature development and production engineering 職能をまたぐチーム替えで、いかに組織と個人の成長が促されるか

5.

Shopify’s history 2004 - https://snowdevil.ca 2006 - Shopify was born on Rails 1.x 2022 - Becoming a “Retail Operating System” Size - $175.4 billion GMV (Gross Merchandise Volume) in 2021 Entrepreneurship - $3 billion in “Shopify Capital” funding since 2016 Global - “Shopify Market” Cross border commerce from day one

6.

Production Engineering at Shopify 2015 ~ 2016 - Shopify adapted Production Engineering model Problem Misalignments among distinct teams Feature dev Scale Monitor Maintenance Outcome Self-service toolings for feature dev, esp. monitoring and alerting Infra components ownership centralized Feature dev Prod Eng 3x deploy speed and frequency (150 / day) Self service Monitoring / Alerting Next-gen Infrastructure

7.

Resiliency at Production Engineering 2020 - The need for specialized team on Resiliency Incident Manager On Call a.k.a IMOC Core incident handling Follow the sun model Deep dive into “cracks” of distributed systems Edge, Ingress, Routing, Application, … Photo by Alexas_Fotos on Unsplash

8.

“We learned to absorb these shocks and become stronger as a result. [..] The school of hard knocks has taught us well.” — Tobi Lütke, CEO in internal essay on why we optimize for flash sales https://speakerdeck.com/sirupsen/goto-copenhagen-2017-shopifys-architecture-to-handle-80k-rps-sales?slide=3

9.

Complexities of Shopify Highly dynamic tra c BFCM Flashsales / bots Highly con gurable shops Liquid Script API endpoints (Headless, ... ffi fi https://shopify.engineering/cloud-load-modular-code-shopify-2022

10.

Taming the large distributed systems Semian Load Shedder Toxiproxy / Game day Photo by Omar Flores on Unsplash

11.

Culture and process Follow the sun model also applies to Root Cause Analysis Autonomy based on “trust batteries” Lean on ChatOps enabling async learning Photo by Jay Heike on Unsplash

12.

Developing a “Journey Map” Observing “three di erent paths” for ICs 1. Feature dev 2. Core API maker 3. SRE The analogy to “Swordsman” ff Photo by Javier Allegue Barros on Unsplash

13.

Feature development teams Deliver high impact product features to the merchants quickly Aim -> Scope -> Execute “How can we iterate quickly, so that we can learn?” プロダクト開発が主戦場 「リーン」 マーケットフィットまで、リソースが限定的な状態で早く回す 成熟しつつあるプロダクトに対して、付加価値を提供する Photo by Krys Amon on Unsplash

14.

Core API makers Long term bets on fundamental components 1. Backbones of web application architecture 2. Investing on “Commerce Primitive” components ドメインエキスパート その興味分野で上り詰めた人たち Photo by Jonny Gios on Unsplash

15.

SRE We connect dots when distributed system fails • IMOC • Investigate on the “seams” of running system • Collaborate / communicate to drive resolution on “cracks” 分散システムの失敗パターンに対するエキスパート Photo purchased from iStock

16.

Multiplication brings value App dev and SRE • Brings the high velocity project scoping • Distributed system 101 Core API dev and App dev • User and Maker feedback Core API dev and SRE • High-level overview v.s. investing on your core interests

17.

We are all one team Growth brings specialization and operational e ciency Imagine the dysfunctional feedback loop: • Highly scalable system without the user growth • Growing features without resiliency toolkit • Exponential domain onboarding cost without simple interface to Core API ffi Photo by Kier In Sight on Unsplash

18.

Chaos Engineer your org Hybrid (bridging) developer can disrupt specialization • Early adaption is quicker and better than an afterthought • It's easily adaptable, since the underlying failure is common across multiple applications • Usually IC has appetite for resiliency toolkits More bridging developers leads to organic early planning: a key to both speed and quality Photo by Olivier Guillard on Unsplash

19.

Feature Development, Core API architects, SREs 流動性の担保が、 会社と個人の競争力に貢献する Shopify's Jungle Gym

20.

What’s next? Shopify’s attracting talents from all over the world. • APAC is growing strong! • We embrace fully distributed environment Develop products that changes livelihood of millions of entrepreneurs • Huge potential in the cross border commerce (my former team) Contribute to one of the most powerful web app stack • Ruby, (not only) Rails, MySQL (KateSQL), k8s

21.

Thank you! @kenzan100 @jp_miyama

22.

Bonus track - How hard was the transition? Shopify managers accepts its “Jungle Gym” 1. Charge your “trust battery” 2. Look for opportunities 3. Probe with the managers