D.S

sungsoo.github.io

Explore In-Memory Data Store Tachyon

Explore In-Memory Data Store Tachyon Better Motivation, Different Thinking Sung-Soo Kim's Blog Home Tags Categories Archive Explore In-Memory Data Store Tachyon Tags big data 255 spark 58 30 September 2015 Article Source Title: Explore In-Memory Data Store Tachyon Explore In-Memory Data Store Tachyon Memory is the key to fast Big Data processing. This has been realized by many, and frameworks such as Spark already leverage memory performance. As data sets continue to grow, storage is increasingly becoming a critical bottleneck in many workloads. To address this need, we have developed Tachyon , a memory centric fault-tolerant distributed file system, which enables reliable file sharing at memory-speed across cluster frameworks, such as Spark and MapReduce. The result of over two years of research, Tachyon achieves memory-speed and fault-tolerance by using memory aggressively and leveraging lineage information. Tachyon caches working set files in memory, and enables different jobs/queries and frameworks to access cached files at memory speed. Thus, Tachyon avoids going to disk to load datasets that are frequently read. Tachyon is Hadoop compatible. Existing Spark and MapReduce programs can run on top of it without any code changes. Tachyon is the default off-heap option in Spark, which means that RDDs can automatically be stored inside Tachyon to make Spark more resilient and avoid GC overheads. The project is open source and is already deployed at multiple companies. In addition, Tachyon has more than 50 contributors from over 20 institutions, including Yahoo, Intel, Redhat, and Pivotal. The project is the storage layer of the Berkeley Data Analytics Stack (BDAS) and also part of the Fedora distribution . In this chapter we first go over basic operations of Tachyon, and then run a Spark program on top of it. For more information, please visit Tachyon’s website or Github repository . We also host regular meetups in the bay area. Prerequisites Assumptions You have a laptop Your laptop has Java 6 or 7 installed Mac OS or Linux (Windows is not supported) Launch Tachyon Configurations All system’s configuration is under tachyon/conf folder. Please find them, and see how much memory is configured on each worker node. $ grep "TACHYON_WORKER_MEMORY_SIZE=" conf/tachyon-env.sh export TACHYON_WORKER_MEMORY_SIZE=1GB You can also read the through the file and try to understand those parameters. For more information on configuration, you can visit Tachyon Configuration Settings webpage . Format the storage Note that if you are running Linux, Tachyon will need root permission to create and use a RAM disk. To start a superuser shell, run sudo su and enter your password. Before starting Tachyon for the first time, we need to format the system. It can be done by using tachyon script in the tachyon/bin folder. Please type the following command: $ ./bin/tachyon format Connection to localhost... Formatting Tachyon Worker @ HYMac-2.local Removing local data under folder: /Users/haoyuan/Downloads/test/tachyon/libexec/../ramdisk/tachyonworker/ Formatting Tachyon Master @ localhost Formatting JOURNAL_FOLDER: /Users/haoyuan/Downloads/test/tachyon/libexec/../journal/ Formatting UNDERFS_DATA_FOLDER: /Users/haoyuan/Downloads/test/tachyon/libexec/../../data/tmp/tachyon/data Formatting UNDERFS_WORKERS_FOLDER: /Users/haoyuan/Downloads/test/tachyon/libexec/../../data/tmp/tachyon/workers Start the system After formatting the storage, we can try to start the system. This can be done by using tachyon/bin/tachyon-start.sh script. $ ./bin/tachyon-start.sh local Killed 0 processes Killed 0 processes Connection to localhost... Killed 0 processes Starting master @ localhost Starting worker @ HYMac-2.local Interacting with Tachyon In this section, we will go over three approaches to interact with Tachyon: Command Line Interface Application Programming Interface Web User Interface Command Line Interface You can interact with Tachyon using the following command: $ ./bin/tachyon tfs Then, it will return a list of options: Usage: java TFsShell [cat <path>] [count <path>] [ls <path>] [lsr <path>] [mkdir <path>] [rm <path>] [tail <path>] [touch <path>] [mv <src> <dst>] [copyFromLocal <src> <remoteDst>] [copyToLocal <src> <localDst>] [fileinfo <path>] [location <path>] [report <path>] [request <tachyonaddress> <dependencyId>] [pin <path>] [unpin <path>] Please try to put the local file tachyon/LICENSE into Tachyon file system as /LICENSE using command line. $ ./bin/tachyon tfs copyFromLocal LICENSE /LICENSE Copied LICENSE to /LICENSE You can also use command line interface to verify this: $ ./bin/tachyon tfs ls / 11.40 KB 02-07-2014 23:23:44:008 In Memory /LICENSE Now, you want to check out the content of the file: $ ./bin/tachyon tfs cat /LICENSE Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION .... Application Programming Interface After using command line to interact w