WalB: Real-time and Incremental Backup System for Block Devices

>100 Views

September 26, 17

スライド概要

WalB is an open-source backup system that consists of block devices, called WalB devices, and userland utilities, called WalB tools. A WalB device records write-I/Os. WalB tools extracts them to create restorable snapshots in an incremental manner.

Compared with dm-snap and dm-thin, WalB is designed to achieve small I/O latency overhead and short backup time. We conducted an experiment to take an incremental backup of a volume under random write workload. The result confirms those advantages of WalB.

Cybozu cloud platform, which has 500TB volumes and processes 25TB write-I/Os per day, is required to achieve (1) stable workload performance without I/O spikes which may affect application user experience and (2) short backup interval specified in our service level objective. WalB satisfies the requirements, while dm-snap is not enough to and dm-thin is not expected to.

profile-image

サイボウズ・ラボ株式会社で教育向けのOSやCPU、コンパイラなどの研究開発をしています。

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

関連スライド

各ページのテキスト
1.

WalB: A Fast and Low Latency Backup System for Block Devices Cybozu Meetup #8 SRE WalB Kota Uchida September 25, 2017 1

2.

About me ▌Kota Uchida ▌SRE team at Cybozu, Inc. ▌A WalB developer 2

3.

About Cybozu ▌A large cloud service vendor in Japan. ▌Largest market shares in field of collaborative software. ▌We serve web applications on our own cloud platform.  kintone: a low-code business app platform  and more 3

4.

#customer companies: #accesses / day: write IOs / day: 20,000+ 210 millions 24.5 TiB 4

5.

Service Level Objective ▌24/7 nonstop service ▌99.99% availability (4 min / month) ▌Daily backup (retention period is 14 days) ▌Disaster recover: copy data to a remote site once a day 5

6.

Architecture of our platform The scope of this talk Storage Server L7LB Application Server Database Server Blob Server dm-snap Backup Server Diff Diff RAID 1 Storage Server dm-snap Remote Site Diff Diff 6

7.

Snapshot Management with dm-snap Logical Structure 0 1 Snapshot Image 3 A B Write A’ Write B’ A’ B’ Latest Image Physical Structure 2 4 (2) Write A’ Original Volume Area B’ (1) CoW Snapshot Area Mapping Info A B 7

8.

Backup using dm-snap Logical Structure Snapshot0 A B (1) Full-scan an old snapshot A’ Snapshot1 A’ B’ B’ (3) Generate a diff image by comparing two snapshots (2) Full-scan a new snapshot 8

9.

Full-scan at night Backup processing time Daytime o’clock 9

10.

UX degradation during a full-scan Full-scanning 10

11.

We have no more “nights” ▌Until now: Full scan is allowed only when access rate is low, i.e., at night. ▌From now on: We have to handle accesses from multiple timezones. ▌We must be able to backup any time without UX degradation. 11

12.

New Solution ▌We need a new solution with:  No IO spikes  Short backup time ▌We compared dm-thin with WalB 12

13.

What is dm-thin? ▌dm-thin provides thin-provisioning volume management to  share same data among volumes  reduce disk usage using snapshots ▌In the mainline Linux kernel 13

14.

Snapshot Management with dm-thin Logical Structure Latest Image A Physical Structure Latest Tree A

15.

Snapshot Management with dm-thin Logical Structure Snapshot A Latest Image A Physical Structure Snapshot Tree Latest Tree A 15

16.

Snapshot Management with dm-thin Logical Structure Snapshot A Write A’ Latest Image A’ Physical Structure Snapshot Tree Latest Tree (1) CoW A (1) CoW (2) Update A’ (2) Write 16

17.

Backup using dm-thin Logical Structure Snapshot0 A B Snapshot1 A’ B’ Physical Structure Snapshot0 Snapshot1 A B A’ B’ Generate a diff image using dm-thin metadata 17

18.

What is WalB? dm-snap full scanning WalB no spikes ▌A real-time and incremental backup system  developed at Cybozu Labs ▌Can backup block devices without IO spikes 18

19.

Special Block Devices for WalB Any application (File system, DBMS, etc.) Read Write WalB device Data device Log device Linear mapped Ring buffer 19

20.

Write IO Logging and Backup with WalB Time series of write I/Os Data Device 0 1 A Log Device 2 3 4 B Time 20

21.

Write IO Logging and Backup with WalB Time series of write I/Os Data Device 0 1 A Log Device 2 3 4 B Write A’ A’ B 1 A’ Scan the log device and generate a diff image Time 21

22.

Write IO Logging and Backup with WalB Time series of write I/Os Data Device 0 1 A Log Device 2 3 4 B Write A’ A’ B 1 A’ 1 A’ Write B’ A’ Time B’ 4 B’ Scan the log device and generate a diff image 22

23.

Performance test ▌Compared dm-snap, dm-thin, and WalB ▌Executed a workload during a backup  The workload & the backup will affect each other ▌Measured the following metrics:  Latencies of the workload  Backup time 23

24.

Environment & Settings ▌Test environment:  CPU:2.40 GHz x 12 cores  MEM:192 GiB  HDD:4 TB HDD, RAID 6 (8D2P)  NIC:10 Gbps x 2  Kernel:4.11 (latest upstream) ▌Test settings:  100 GiB volumes  Workload: 4 KiB Random writes for a 5 GiB range 24

25.

Measuring the Backup Time (dm-snap, dm-thin) 4 KiB Random Writes 5 GiB 95 GiB (unchanged) dm-snap : scan full image dm-thin : scan changed chunks (tree traversal) ▌dm-snap:take a snapshot & scan full image ▌dm-thin:get a structure of snapshot trees & find modified blocks & read these blocks 25

26.

Measuring the Backup Time (WalB) Backup Server 4 KiB Random Writes WalB Device 5 GiB 95 GiB (unchanged) Diff Diff Write IO logs Network Log Device WalB : scan logs ▌WalB:scan logs from a log device & send them to a backup server continuously 26

27.

Write I/O latency IO spikes due to CoW, worse than dm-snap! dm-thin dm-snap large due to CoW WalB Small overhead no-backup 27

28.

Backup time slower than dm-snap 2260 1146 so fast! 1.2 28

29.

Conclusion ▌dm-snap & dm-thin  High I/O latency during a backup  Long backup time ▌WalB  Stable and low I/O latency (no spikes)  Short backup time WalB satisfies our requirements for production use. 29

30.

Try WalB! ▌Project page  https://walb-linux.github.io/ ▌Tutorial  https://github.com/walb-linux/walbtools/tree/master/misc/vagrant/  Vagrantfile for Ubuntu 16.04 and CentOS 7 30

31.

Incremental backup Remote Host Volume Backup Host Diff Diff Apply everyday … Diff Base Diff files for 14 days ▌Daily backup (retention period is 14 days) ▌Worker daemon of WalB selects diff files older than 14 days and applies them to a base image. 31

32.

Restoring a volume Remote Host Diff Diff … Diff Apply all diffs Base Base' Writable snapshot ▌To restore the latest state of a volume:  take a snapshot of a base image, and  apply all diff files to it. 32

33.

Make restoration faster 1/2 Remote Host Diff … Diff 1 2 Diff Diff Base 14 dm-thin snapshots for each day ▌Fast restoration by preparing read-only snapshots for each day 33

34.

Make restoration faster 2/2 Remote Host Diff … Diff 1 2 Diff Diff Base 14 ▌Apply some diffs to the appropriate snapshot. ▌At most 24 hours of diffs are needed to be applied. Faster! 34

35.

Worldline: restoring a whole environment ▌"Worldline" means a parallel world. ▌We backup configurations in addition to user data.  Configurations: definitions for each customer (ID, FQDN, Apps, …), application version definition, host definition, etc. ▌It is important to use applications whose versions are consistent with user data backed up before. 35

36.

Worldline: restoring a whole environment ▌A daily script takes a snapshot of a whole environment. ▌An weekly script restores the latest backup, so we can use it for investigation of failures or development our services. User data Backup Diff Diff Worldline Restore Snap shot Config DB' Spare hosts Config DB Restore Backup Diff Diff 36

37.

Q&A email: kota-uchida@cybozu.co.jp twitter: @uchan_nos 37