Data Cube Materialization Using Map Reduce
Data cube queries act an important class of On-Line Analytical Processing (OLAP) queries in decision support systems. To meet this needof improved performance and to handle the increase in data sizes, parallel solutions effectively we have generating the data cube. We present load balanced and communication efficient partitioning strategies which generate a sub cube computation for every processor. Sub cube computations are then carried out using existing sequential, external memory data cube algorithms. In parallel systemsbalancing the load assigned to different processors and minimizing the communication overhead are the core problems for achieving high performance. In this paper we propose a three-phase cube computation algorithm MR-Cube that employs these techniques to successfully cube billion-tuple sized data sets, and optionally surfaces interesting cube groups. In this paper, we details real-world challenges in cube materialization and mining tasks on some type of data sets. Unusually, we identify an important subset of holistic measures i.e. non-algebraic measures and bring out the MR-Cube, i.e.MapReduce-based framework for efficient cube computation and identification of interesting cube groups on holistic measures.
Kawhale Rohitkumar, Sarita Patil