Cassandra: Now and the Future @ Yahoo! JAPAN


October 10, 17


Introduction to the NoSQL trends and analysis surrounding Cassandra, and use cases based on Cassandra at Yahoo! JAPAN.


2023年10月からSpeaker Deckに移行しました。最新情報はこちらをご覧ください。



埋め込む »CMSなどでJSが使えない場合



Cassandra: Now and the Future @ Yahoo! JAPAN 10 October 2017 Satoshi Konno


whoami Satoshi Konno Engineering Manager of NoSQL and NewSQL Teams @ Yahoo! Japan Open Source Software Developer for Virtual Reality, IoT and Cloud Computing Doctor's Course Student @ JAIST Défago Lab: The φ accrual failure detector 2


Agenda • • • • 3 What is the NoSQL Team? Why did we choose Cassandra? What is NewSQL? NewSQL with Cassandra


NoSQL Team Co p yrig ht © 2 0 1 7 Yaho o Jap an Co rp o ratio n. All Rig hts Reserved .


What is Yahoo! JAPAN? Many Strong Services Media Search Video Answer Mail US JP News Membership Search C2C Payment Knowledge search C2C EC B2C EC Mail Local US JP Premium 5 Wallet YAHUOKU! Loco


What is the NoSQL Team? 100+ Services 300+ Systems 6 NoSQL Team


Cassandra @ Yahoo! JAPAN 2010 Service Departments NoSQL Team 7 0.5 NoSQL Team 2012 0.8 0.8 2014 2016 2018 1.x 1.x 2.x 3.x


Cassandra @ Yahoo! JAPAN 2017 8 3 50 2000+ 50TB 500,000 500,000 DCs Clusters Nodes Usages Read/sec Write/sec 1 50 Shared Special 10 Cluster Clusters Nodes / 50 50 Systems Systems Cluster … 200 Nodes / Cluster


NoSQL Team with Cassandra Co p yrig ht © 2 0 1 7 Yaho o Jap an Co rp o ratio n. All Rig hts Reserved .


Before Cassandra • Problems: Inappropriate usage by internal platforms and new demands for more big data. 2012 We store data of our services on your platforms. Don’t store any big key data!! Key Value Store Key Value Store Team Services We want to store more big data of our new services easily. 10 Don’t use it like a key value store!! Search Engine Search Engine Team


NoSQL Team • Launched NoSQL Team in 2012 2012 We should build new centralized platform for more big data!! However, many open source NoSQL databases have been released already, so we have to evaluate these. Key Value Store Team Search Engine Team Join Join NoSQL Team Service Departments 11 Join New


NoSQL Team • NoSQL team selected Cassandra as our first centralized NoSQL database. 2012 • • • • • Services High Availability Performance Persistence Scalability ….. • Maintainability • Appropriate Open Source License • ….. Function Point Analysis No1 12 NoSQL Team


State of NoSQL Databases NoSQL 2012 0.x 2.x 2014 2016 1.x 2.x 3.x 3.x 0.x 1.x 2.x 3.x 1.x 2.x 0.9.7.x 13 2.x 4.x 4.x 2.x 1.x 2018


NewSQL Co p yrig ht © 2 0 1 7 Yaho o Jap an Co rp o ratio n. All Rig hts Reserved .


NewSQL Trends • NewSQL = NoSQL (Scalability) + RDBMS (SQL, ACID) = Scalable RDBMS like NoSQL NoSQL Team 15


Requests for NewSQL Private Cloud Public Cloud OSS We have big onpremises data centers and, we can’t use the NewSQL platforms in our private cloud. Services 16 Knowledge Experience We want to make use of our knowledge experience with Cassandra. NoSQL Team


NewSQL with Cassandra Google Amazon Spanner Aurora Function Could we use Cassandra for storage layer of NewSQL databases? Logging Query Engine Transaction NoSQL Team Schema Store Storage 17 MariaDB Cockroachdb


Trial for NewSQL with Cassandra (OSS SQL Engines with Cassandra) Co p yrig ht © 2 0 1 7 Yaho o Jap an Co rp o ratio n. All Rig hts Reserved .


Trial Concept • OSS SQL Engine + Distributed Storage = PostgreSQL + Cassandra or SQLite + Cassandra Function Could we replace storage layer of SQL databases with Cassandra? Logging Query Engine Transaction Schema NoSQL Store Team Storage 19 Traial


Study Implementation Storage Manager PostgreSQL’s storage is abstracted as the storage manager, but ….. NoSQL Team SQLite’s storage is abstracted as the virtual file system too, but ….. To implement the abstract functions directly is hard to debug …. Virtual File System NoSQL Team 20


POSIX Emulation The storage layers of PostgreSQL and SQLite are implemented using POSIX file I/O functions. Storage Manager NoSQL POSIX file I/O functions Virtual File System POSIX file I/O functions Team Develop compliant library with Cassandra for POSX file I/O functions, and replace the POSIX functions with Cassandra compliant functions #define open(path, flags, mode) posix_vfs_cassandra_open(path, flags, mode) #define close(fd) posix_vfs_cassandra_close(fd) #define read(fd, buf, nbytes) posix_vfs_cassandra_read(fd, buf, nbytes) #define write(fd, buf, nbytes) posix_vfs_cassandra_write(fd, buf, nbytes) #define access(path, mode) posix_vfs_cassandra_access(path, mode) #define unlink(path) posix_vfs_cassandra_unlink(path) #define fstat(fd, buf) posix_vfs_cassandra_fstat(fd, buf) #define fsync(fd) posix_vfs_cassandra_fsync(fd) #define lseek(fd, offset, posix_vfs_cassandra_lseek(fd, Compliant offset, whence) POSIX file whence) I/O file I/O functions functions 21 NoSQL Team This implementation method easy to write the unit test, and it is easy to debug too.


File Management SQL Engines • • • • • • Storage Manager Virtual File System File A File B ..... ..... ..... File N File Cassandra File Block 0 Block 1 Block 2 ….. ….. ….. ….. CREATE TABLE IF NOT EXISTS ( path varchar, block_no bigint, block blob, PRIMARY KEY (path, block_no)); 22 …..


Benchmark (v3.20.1 + speedtest.tcl) 35 • Naive Implementation X : Multi-threads X : Async Requests • Don’t care X : Only Storage Layer X : Access Coflict 30 This is very a naive and rough implementation of a distributed database now, but …. 25 20 15 10 5 0 INSERT SQLite (Disk) 23 SELECT SQLite+C* (1KB) SQLite+C* (4KB) UPDATE SQLite+C* (8KB)


Yahoo! JAPAN ブース A 連絡先 : [email protected] 24


References Co p yrig ht © 2 0 1 7 Yaho o Jap an Co rp o ratio n. All Rig hts Reserved .