Assessing the Dynamics of Data Processing: In-Memory Versus Disk-Based Methods in the Context of Big Data Analytics

Semen M. Levin

Abstract


This study embarks on a comprehensive analysis to evaluate the efficiency and scalability of in-memory computing (IMC) compared to traditional disk-based processing for big data analytics. Utilising the "New York City Taxi Trip Duration" dataset from Kaggle, we designed an experiment focusing on three critical analytical tasks: aggregation, sorting, and filtering. Our objective was to quantify the performance improvements offered by IMC, as facilitated by Apache Spark, against conventional SQL queries executed on a disk-based system. The findings reveal that IMC consistently outperforms disk-based processing in execution time, with significant reductions observed across all tasks. Specifically, the aggregation task highlighted the stark contrast in data retrieval speed, demonstrating IMC's superior efficiency with a completion time of 47.3 seconds, compared to 138.7 seconds for disk-based processing. Similar disparities were noted in sorting and filtering tasks, further accentuating IMC's performance advantage. Resource utilisation analysis, focusing on CPU and RAM consumption, indicated higher demands associated with IMC, underscoring the trade-off between enhanced speed and increased resource usage. The investigation provides a nuanced understanding of the practical implications of adopting IMC for big data analytics, especially considering the resource constraints of home computing environments. By juxtaposing theoretical advantages with empirical data, this paper contributes to the ongoing discourse on optimising data processing methodologies in the era of big data, offering insights into the balance between computational efficiency and resource management.

Keywords: in-memory computing, big data analytics, disk-based processing, data processing efficiency, resource utilisation, Apache Spark, data analytics performance

DOI: 10.7176/CEIS/15-1-05

Publication date: April 30th 2024


Full Text: PDF
Download the IISTE publication guideline!

To list your conference here. Please contact the administrator of this platform.

Paper submission email: CEIS@iiste.org

ISSN (Paper)2222-1727 ISSN (Online)2222-2863

Please add our address "contact@iiste.org" into your email contact list.

This journal follows ISO 9001 management standard and licensed under a Creative Commons Attribution 3.0 License.

Copyright © www.iiste.org