Big data in multi-block data analysis: An approach to parallelizing Partial Least Squares Mode B algorithm
Other authors
Publication date
2019-04-29ISSN
2405-8440
Abstract
Partial Least Squares (PLS) Mode B is a multi-block method and a tightly coupled algorithm for estimating structural equation models (SEMs). Describing key aspects of parallel computing, we approach the parallelization of the PLS Mode B algorithm to operate on large distributed data. We show the scalability and performance of the algorithm at a very fine-grained level thanks to the versatility of pbdR, a R-project library for parallel computing. We vary several factors under different data distribution schemes in a supercomputing environment. Shorter elapsed times are obtained for the square-blocking factor 16 × 16 using a grid of processors as square as possible and non-square blocking factors 1000 × 4 and 10000 × 4 using an one-column grid of processors. Depending on the configuration, distributing data in a larger number of cores allows reaching speedups of up to 121 over the CPU implementation. Moreover, we show that SEMs can be estimated with big data sets using current state-of-the-art algorithms for multi-block data analysis.
Document Type
Article
Document version
Published version
Language
English
Subject (CDU)
004 - Computer science and technology. Computing. Data processing
Keywords
Computer science
Computational mathematics
Big data
Dades massives
Pages
29 p.
Publisher
Elsevier
Is part of
Heliyon
This item appears in the following Collection(s)
Rights
© L'autor/a
Except where otherwise noted, this item's license is described as http://creativecommons.org/licenses/by-nc-nd/4.0/