World Scientific
  • Search
  •   
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.
Programming Big Data Applications cover

In the age of the Internet of Things and social media platforms, huge amounts of digital data are generated by and collected from many sources, including sensors, mobile devices, wearable trackers and security cameras. These data, commonly referred to as big data, are challenging current storage, processing and analysis capabilities. New models, languages, systems and algorithms continue to be developed to effectively collect, store, analyze and learn from big data.

Programming Big Data Applications introduces and discusses models, programming frameworks and algorithms to process and analyze large amounts of data. In particular, the book provides an in-depth description of the properties and mechanisms of the main programming paradigms for big data analysis, including MapReduce, workflow, BSP, message passing, and SQL-like. Through programming examples it also describes the most used frameworks for big data analysis like Hadoop, Spark, MPI, Hive and Storm. Each of the different systems is discussed and compared, highlighting their main features, their diffusion (both within their community of developers and among users), and their main advantages and disadvantages in implementing big data analysis applications.

Request Inspection Copy

Sample Chapter(s)
Preface
Chapter 1: Introduction

Contents:

  • Preface
  • About the Authors
  • Acknowledgments
  • List of Figures
  • List of Tables
  • Introduction:
    • Motivation and Goals
    • Main Topics
    • Audience and Organization
    • Online Resources
  • Big Data Concepts:
    • Big Data Principles and Features
    • Data Science Concepts
    • Big Data Storage
    • Scalable Data Analysis
    • Parallel Computing
    • Cloud Computing
    • Toward Exascale Computing
    • Parallel and Distributed Machine Learning
  • Programming Models for Big Data:
    • Parallel Programming for Big Data Applications
    • The MapReduce Model
    • The Workflow Model
    • The Message-Passing Model
    • The BSP Model
    • The SQL-Like Model
    • The PGAS Model
    • Models for Exascale Systems
  • Tools for Big Data applications:
    • Introduction
    • MapReduce-based Programming Tools
    • Workflow-based Programming Tools
    • Message Passing-based Programming Tools
    • BSP-based Programming Tools
    • SQL-like Programming Tools
    • PGAS-based Programming Tools
  • Comparing Programming Tools:
    • Introduction
    • Comparative Analysis of the System Features
    • Comparative Analysis through Application Examples
  • Choosing the Right Framework to Tame Big Data:
    • The Input Data
    • The Application Class
    • The Infrastructure
    • Other Factors
  • Supplementary Material
  • Bibliography
  • Index

Readership: Undergraduate and graduate students in computer science, computer engineering, data science, and data engineering. PhD students and researchers in computer science and engineering, and data science.

Free Access
FRONT MATTER
  • Pages:i–xix

https://doi.org/10.1142/9781800615052_fmatter

Free Access
Chapter 1: Introduction
  • Pages:1–5

https://doi.org/10.1142/9781800615052_0001

No Access
Chapter 2: Big Data Concepts
  • Pages:7–59

https://doi.org/10.1142/9781800615052_0002

No Access
Chapter 3: Programming Models for Big Data
  • Pages:61–97

https://doi.org/10.1142/9781800615052_0003

No Access
Chapter 4: Tools for Big Data Applications
  • Pages:99–186

https://doi.org/10.1142/9781800615052_0004

No Access
Chapter 5: Comparing Programming Tools
  • Pages:187–245

https://doi.org/10.1142/9781800615052_0005

No Access
Chapter 6: Choosing the Right Framework to Tame Big Data
  • Pages:247–256

https://doi.org/10.1142/9781800615052_0006

No Access
Supplementary Material
  • Pages:257–258

https://doi.org/10.1142/9781800615052_0007

Free Access
BACK MATTER
  • Pages:259–275

https://doi.org/10.1142/9781800615052_bmatter

Domenico Talia is a professor of computer engineering at the University of Calabria and an honorary professor at Amity University. He is a Senior Associate Editor of ACM Computing Surveys, an Associate Editor of The Computer Journal, and a member of the editorial board of Future Generation Computer Systems, IEEE Transactions on Parallel and Distributed Systems, the International Journal of Web and Grid Services, the Journal of Cloud Computing, Big Data and Cognitive Computing, and the International Journal of Next-Generation Computing. His research interests include HPC, Big Data, machine learning, parallel and distributed data analysis, cloud computing, social media analysis, distributed knowledge discovery, peer-to-peer systems, and concurrent programming models. He has authored several books and more than 400 scientific papers.

 

Paolo Trunfio is an associate professor of computer engineering at the University of Calabria. In 2007 he was a visiting researcher at the Swedish Institute of Computer Science (SICS) in Stockholm. He currently serves as Associate Editor of the Journal of Big Data, IEEE Transactions on Cloud Computing, and ACM Computing Surveys, and is a member of the editorial board of several scientific journals including Future Generation Computer Systems, Big Data and Cognitive Computing, the International Journal of Web Information Systems, and the International Journal of Parallel, Emergent and Distributed Systems. His research interests include cloud computing, Big Data, social media analysis, parallel and distributed knowledge discovery, and peer-to-peer systems.

 

Fabrizio Marozzo is an assistant professor of computer engineering at the University of Calabria. He received a PhD in Systems and Computer Engineering at the same university. In 2011–2012 he visited the Barcelona Supercomputing Center for a research internship with the Grid Computer Research group in the Computer Sciences department. He sits on the editorial board of several journals, including IEEE Access; IEEE Transactions on Big Data; the Journal of Big Data; Big Data and Cognitive Computing; Algorithms; Frontiers in Big Data; Heliyon; and SN Computer Science. His research interests include big data analysis, social media analysis, high performance computing, cloud and edge computing, and machine learning.

 

Loris Belcastro is a researcher of computer engineering at the University of Calabria, Italy. He received a PhD in Information and Communication Engineering at the University of Calabria. In 2012 he held a scholarship at the Institute of High-Performance Computing and Networking of the Italian National Research Council (ICAR-CNR). He serves as guest editor for numerous journals, including Future Generation Computer Systems; the Journal of Big Data; Sensors; Algorithms; Applied Sciences; and Frontiers in Big Data. His research interests include cloud and edge computing, big data, social media analysis, parallel and distributed data analysis.

 

Riccardo Cantini is a computer engineering researcher at the University of Calabria, Italy. He received a PhD in Information and Communication Technologies at the same university. Between 2021-2022 he was a visiting researcher at the Barcelona Supercomputing Center, working with the Workflows and Distributed Computing group in the Computer Sciences department. His research interests include social media and big data analysis, machine and deep learning, natural language processing, opinion mining, topic detection, edge computing, and high-performance data analytics.

 

Alessio Orsino is currently pursuing a PhD in Information and Communication Technologies at the University of Calabria, Italy. In 2023 he was a visiting researcher at the Department of Computer Science and Technology of the University of Cambridge, collaborating with the Mobile Systems Research Lab. His research interests include big data analysis, parallel and distributed computing, high-performance data analytics, cloud and edge computing, and machine learning.

Supplementary Material

Programming Big Data: Lecture Slides (Powerpoint) (64 MB)
Programming Big Data: Lecture Slides (PDF) (26 MB)

Please note that the programs provided are not a commercial product and are provided solely with illustrative purposes. The author and the publisher are not responsible for losses, damages etc. as a result of program implementation.

Sample Chapter(s)
Preface
Chapter 1: Introduction