Efficient Ways to Improve the Performance of HDFS for Small Files
Abstract
Hadoop , an open-source implementation of MapReduce dealing with big data is widely used for short jobs that require low response time. Facebook, Yahoo, Google etc. makes use of Hadoop to process more than 15 terabytes of new data per day. MapReduce gathers the results across the multiple nodes and return a single result or set. The fault tolerance is offered by MapReduce platform and is entirely transparent to the programmers. HDFS (Hadoop Distributed File System), is a single master and multiple slave frameworks. It is one of the core component of Hadoop and it does not perform well for small files as huge numbers of small files pose a heavy burden on NameNode of HDFS and decreasing the performance of HDFS. HDFS is a distributed file system which can process large amounts of data. It is designed to handle large files and suffers performance penalty while dealing with large number of small files. This paper introduces about HDFS, small file problems and ways to deal with it.
Keywords: Hadoop; Hadoop Distributed File System; MapReduce; small files
To list your conference here. Please contact the administrator of this platform.
Paper submission email: CEIS@iiste.org
ISSN (Paper)2222-1727 ISSN (Online)2222-2863
Please add our address "contact@iiste.org" into your email contact list.
This journal follows ISO 9001 management standard and licensed under a Creative Commons Attribution 3.0 License.
Copyright © www.iiste.org