A new approach for data visualization problem
الموضوعات :MohammadReza Keyvanpour 1 , Mona Soleymanpour 2
1 - Department of Computer Engineering, Alzahra University
2 - Qazvin Islamic Azad Branch
الکلمات المفتاحية: Quadric Assignment Problem (QAP), Artificial Bee Colony (ABC), Data visualization,
ملخص المقالة :
Data visualization is the process of transforming data, information, and knowledge into visual form, making use of humans’ natural visual capabilities which reveals relationships in data sets that are not evident from the raw data, by using mathematical techniques to reduce the number of dimensions in the data set while preserving the relevant inherent properties. In this paper, we formulated data visualization as a Quadric Assignment Problem (QAP), and then presented an Artificial Bee Colony (ABC) to solve the resulted discrete optimization problem. The idea behind this approach is to provide mechanisms based on ABC to overcome trapped in local minima and improving the resulted solutions. To demonstrate the application of ABC on discrete optimization in data visualization, we used a database of electricity load and compared the results to other popular methods such as SOM, MDS and Sammon's map. The results show that QAP-ABC has high performance with compared others.
1. L. Xu, Y. Xu, T. W. S. Chow, "PolSOM: A new method for multidimensional data visualization", Pattern Recognition, Volume 43, Issue 4, pp. 1668–1675, 2010.
2. Y. Xu, L. Xu, T. W. S. Chow, "PPoSOM: A new variant of PolSOM by using probabilistic assignment for multidimensional data visualization", Neurocomputing, Volume 74, Issue 11, pp. 2018–2027, 2011.
3. F. S. Tsai, "Dimensionality reduction techniques for blog visualization", Expert Systems with Applications, Volume 38, Issue, pp. 2766–2773, 2011.
4. R. Abbiw-Jackson, B. Golden, S. Raghavan, and E. Wasil, "A divide-and-conquer local search heuristic for data visualization", Computers and operations research, Volume 33, Issue 11, pp. 3070–3087, 2006.
5. M. H. Ghaseminezhad, A. Karami, "A novel self-organizing map (SOM) neural network for discrete groups of data clustering", Applied Soft Computing, Volume 11, Issue 4, pp. 3771–3778, 2011.
6. P. Klement, V. Snášel, "Using SOM in the performance monitoring of the emergency call-taking system", Simulation Modelling Practice and Theory, Volume 19, Issue 1, pp. 98–109, 2011.
7. J. W. Sammon, "A nonlinear mapping for data structure analysis", IEEE Transactions on computers, Volume 18, Issue 5, pp. 401–409, 1969.
8. J. Sun, C. Fyfe, M. Crowe, "Extending Sammon mapping with Bregman divergences", Information Sciences, Volume 187, pp. 72–92, 2012.
9. G. Gan, C. Ma, J. Wu, Data Clustering: Theory, Algorithms, and Applications, Society for Industrial and Applied Mathematics, 2007.
10. P. A. De Mazière, M. M. Van Hulle, "A clustering study of a 7000 EU document inventory using MDS and SOM", Expert Systems with Applications, Volume 38, Issue 7, pp. 8835–8849, 2011.
11. M. Ghanbari, "Visualization Overview", Thirty-Ninth Southeastern Symposium on System Theory, pp. 115–119, 2007.
12. S.-H. Bae, J. Qiu, G. Fox, "Adaptive interpolation of multidimensional scaling", Procedia Computer Science, Volume 9, pp. 393–402, 2012.
13. A. M. Lopes, J. A. Tenreiro Machado, C. M. A. Pinto, and A. M. S. F. Galhano, "Fractional dynamics and MDS visualization of earthquake phenomena", Computers & Mathematics with Applications, Volume 66, Issue 5, pp. 647–658, 2013.
14. T. Kohonen, "The self-organizing map", Neurocomputing, Volume 21, Number 1-3, pp. 1–6, 1998.
15. H. T. Jadhav, R. Roy, "Gbest guided artificial bee colony algorithm for environmental/economic dispatch considering wind power", Expert Systems with Applications, Volume 40, Issue 16, pp. 6385–6399, 2013.
16. A. P. Engelbrecht, Computational Intelligence: An Introduction, Wiley, 2007.
17. C. B. Kalayci, S. M. Gupta, "Artificial bee colony algorithm for solving sequence-dependent disassembly line balancing problem", Expert Systems with Applications, Volume 40, Issue 18, pp. 7231–7241, 2013.
18. H.-C. Tsai, "Integrating the artificial bee colony and bees algorithm to face constrained optimization problems", Information Sciences, Volume 258, pp. 80–93, 2013.
19. J.-Y. Park, S.-Y. Han, "Application of artificial bee colony algorithm to topology optimization for dynamic stiffness problems", Computers & Mathematics with Applications, Volume 66, Issue 10,, pp. 1879–1891, 2013.
20. M. Rani, H. Garg, S. P. Sharma, "Cost minimization of butter-oil processing plant using artificial bee colony technique", Mathematics and Computers in Simulation, Volume 97, pp. 94–107, 2014.
21. V. Golmah, J. Parvizian, "Visualization and the understanding of multidimensional data using Genetic Algorithms: Case study of load patterns of electricity customers", International Journal of Database Theory & Application, Volume 3, Issue 4, pp. 41–56, 2010.
22. J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankangas, S. Team, and L. Oy, "Som toolbox for matlab 5", Technical Report A57, Helsinki University of Technology, Neural Networks Research Centre, 2000.
23. A. Alizadegan, B. Asady, M. Ahmadpour, "Two modified versions of artificial bee colony algorithm", Applied Mathematics and Computation, Volume 225, pp. 601–609, 2013
A new approach for data visualization problem
Data visualization is the process of transforming data, information, and knowledge into visual form, making use of humans’ natural visual capabilities which reveals relationships in data sets that are not evident from the raw data, by using mathematical techniques to reduce the number of dimensions in the data set while preserving the relevant inherent properties. In this paper, we formulated data visualization as a Quadric Assignment Problem (QAP), and then presented an Artificial Bee Colony (ABC) to solve the resulted discrete optimization problem. The idea behind this approach is to provide mechanisms based on ABC to overcome trapped in local minima and improving the resulted solutions. To demonstrate the application of ABC on discrete optimization in data visualization, we used a database of electricity load and compared the results to other popular methods such as SOM, MDS and Sammon's map. The results show that QAP-ABC has high performance with compared others.
Keywords: Data visualization, Quadric Assignment Problem (QAP), Artificial Bee Colony (ABC)
1. Introduction
Data visualization is a graphical presentation of multidimensional data making use of humans’ natural visual capabilities. More specifically, data visualization reveals relationships in data sets that are not evident from the raw data, by using mathematical techniques to reduce the number of dimensions in the data set while preserving the relevant inherent properties. Compared with the data appear in their original high dimensional space, we may be more capable of finding the possible relationship among the data points in the low dimensional space, i.e., 2-D. therefore, it has been widely applied to solve many problems, e.g. signal compression, pattern recognition, image processing, etc [1].
Dimensionality reduction or projection techniques can transform large data of multiple dimensions into a smaller, more manageable set with special methods. Principal component analysis (PCA) and MultiDimensional Scaling (MDS) are two popular methods for data reduction and visualization. PCA is one of the most widely used methods because it is effective and robust for performing linear projection. Linear projection means the projection of data is conducted by multiplying each component of the original vector with a scalar. Thus, PCA is not the most suitable approach when one is dealing with highly nonlinear data. MDS is another classical projection but its final visualization map is difficult to perceive, when one is handling high dimensional and highly unsymmetrical data set. Sammon’s mapping is the earliest approach in nonlinear projecting data into low dimensional space for visualizing multivariate data. Sammon’s mapping tries to minimize the distances between input data in the original high dimensional space and the output data in the projected space. It is capable of preserving the topological structure of the original data and their corresponding inter point distances, but Sammon’s mapping is computationally demanding especially when one is handling huge numbers of data points. In addition, it requires re-computation when new data points are added [2, 3].
Discrete optimization techniques provide a possible alternative for the data visualization problem. Discretizing the data visualization problem may result in a large-scale quadratic assignment problem that is very difficult to solve to optimality [4]. The complexities of these problems are high and it is a NP Complete. One approach to overcome high large-scale quadratic assignment and its complexity is solving it by heuristics or intelligent optimization. As more and more real-world optimization problems become increasingly complex, algorithms with more capable optimizations are also increasing in demand. We formulate the data visualization problem as a Quadric Assignment Problem (QAP) and then use the Artificial Bee Colony (ABC) to solve it. The idea behind this approach is to provide mechanisms based on ABC to overcome trapped in local minima and improving the resulted solutions. To evaluate our method we use a real data set and compare our proposed hybrid approach which is named QAP-ABC with SOM, MDS and Sammon's Map as popular methods for data visualization.
The rest of the paper is organized as follows. Section 2 reviews relevant works in data visualization. Section 3 introduces modeling the problem as a Quadratic Assignment Problem and discusses Artificial Bee Colony (ABC) to solve this problem in detail. Section 4 presents a case study to evaluate the technique with experimental results, and then compares it with Self-Organizing Maps (SOM), Sammon's Map (SM) and MultiDimensional Scaling (MDS) in section 5. Section 6 concludes the paper.
2. Background
The most common data visualization methods allocate a representation for each data point in a lower-dimensional space and try to optimize these representations so that the distances between them are as similar as possible to the original distances of the corresponding data items. The methods differ in that how the different distances are weighted and how the representations are optimized. Linear mapping, like principle component analysis, is effective but cannot truly reflect the data structure. Non-linear mapping, like Sammon’s Mapping (SM), MultiDimensional Scaling (MDS) and Self-Organizing Map (SOM) [5, 6], requires more computation but is better at preserving the data structure.
2.1. Sammon’s Mapping (SM)
Sammon Jr. [7] introduced a method for nonlinear mapping of multidimensional data into a two- or three-dimensional space. This nonlinear mapping is a MultiDimensional Scaling (MDS) projection which preserves approximately the inherent structure of the data and thus is widely used in pattern recognition. It tries to match the pairwise distances in the lower-dimensional representations with their original distances [3].
Denote the data vector (N is the number of data points) in a d-dimensional input space and the corresponding vector in a l-dimensional out- put space (l=2 or 3). Define the distance between and in the input space as and the distance between and in the output space as . Distances are measured by Euclidian metric. Error E is defined as follows:
(1) |
(2) |
(3) |
| (4) |
| (5) |
| (6) |
Before introducing the ABC algorithm, it is necessary to give used notation in this algorithm. Let, : A uniform random index from interval and it has to different from . : The number of food sources. : The uniform random index that is belong to : The dimension of the problem. : The fitness value of the solution i. 1: Initialize the population of solution
Fig. 6. The best solution resulted from ABC representation
4. Experimental Results This section evaluates the performance of the QAP-ABC, SOM, MDS and Sammon's Map to data visualization. Experiments were performed using an Intel Core 2 Duo, 2.1 GHz processor with 2 GB of RAM.
4.1. Data Set In order to compare the proposed technique (QAP-ABC) and the well-known methods such as Self-Organizing Maps algorithm (SOM), Sammon' Map and MDS, a real-life case is considered here. Used data set in this research includes 24-hour electrical load of 366 days starting from 20 March 2008 in Esfahan. Therefore, we have 366 instances with 24 attributes, which any attribute is as an hour (a label number for all the daily load curves).
4.2. Parameters Setting The dimension of the output map in this research is assumed to be two (), because most visible media (for example: paper, monitor panel and etc.) are 2-dimensional. The output map is a 60×60 grid square. Dimension of the output map is calculated by a heuristic function in SOM toolbox in MATLAB [22]. ABC settings: Except common parameter (maximum cycle number), the basic ABC used in this study employs two control parameters D and SN. The maximum cycle number is supposed 2000 and other ABC parameters is resulted from [18] as: In original ABC algorithm, 50% of the colony consists of employed and the rest 50% consists of onlooker bees. In other words, the ratio of employed and onlooker bees are the same, 1:1 [23]. Therefore, we use ratio 1:1 in this study. To reach the optimal parameters we run ABC with different parameters D and SN. The ABC program was run 30 times and average of observations register for any parameters. Table 1 and Fig. 7 show the results of original (ratio 1:1) ABC algorithms with different D and SN. Table 1. Fitness values for QAP-ABC with SN=25 and various number of dimensions
Fig. 7. The sensitivity of ABC to number of dimensions (D) in QAP-ABC
The result of QAP-ABC with SN=25 and various number of dimensions (D) is shown in Fig. 7 and Table. 1. As it is shown in Fig. 7 and Table 1 more number of dimensions improves the fitness. The decreasing of fitness by increasing the number of dimensions continues until D reach to 20. Therefore we propose D=20 for ABC. To determine the SN, we run ABC program with D=20 and different SN (see Table 2 and Fig. 8).
Table 2. Fitness values for QAP-ABC with D=20 and various number of onlooker bees
Fig. 8. The sensitivity of ABC to number of onlooker bees (SN) in QAP-ABC
As it is shown in Table 2 and Fig. 8, increasing of the number of onlooker bees (SN) improve the fitness of resulted solutions. The fitness is unchanged for number of onlooker bees greater than 125. Thus we propose SN=125 in our study. SOM settings: The SOMs were trained using the batch training algorithm in two phases: (1) rough training phase which lasted 1000 iterations with an initial neighborhood radius equal to 5, a final neighborhood radius equal to 2, While a learning rate starts at 0.5 and end at 0.1, and (2) fine training phase which lasted 500 iteration cycles, While a learning rate starts at 0.1 and end at 0.02, we do set the neighborhood partitions started at 2 and end with 0.
5. Quality of Data Visualization To illustrate the effectiveness of the proposed method (QAP-ABC), the precise results of used data visualization methods for several data sets that are randomly selected from original data set are shown from Error! Reference source not found.. Number size of data sets is 50-100 % (increment 10) of total records number. All of methods run for 1500 seconds. As it is seen in Table 3, Sammon's Map couldn’t reach to solution in limited iteration (2000 iterations) for large data bases (above 70%) and defect in convergence. Moreover, it is similar results for MDS. The MDS is defected for above 60% to reach solution in regarded iteration.
Table 3. The data visualization precise of various approaches
Fig. 9. To compare the results of SOM and QAP-ABC
As shown in Error! Reference source not found., Sammon's Map and MDS although have a high precise but those couldn't be success for large data set and don’t convergent and give "Iteration limit exceeded. Minimization of criterion did not converge" error after 2000 iterations. Because, used nonlinear optimization techniques for MDS and Sammon’s Map are usually inefficient for large data sets. In addition, since nonlinear optimization techniques need high computational time, MDS and Sammon’s Map are slow. Fitness value, which calculates the coefficient error of preserving of the approximately inherent structure of the data by dimension reduction, is 0.0327 and 0.0563 for QAP-ABC and SOM, respectively (Fig. 9). It can be clearly observed that the QAP-ABC has been better than SOM and other used methods and 41% improvement in the fitness values over SOM. In ABC, The onlooker ants select scout bees based on elite selection. The elite selection preserves the best solutions without any change in next step. The transforming of the best solutions to next step and the ignorance of the worst solutions cause to search increase for bees in the high fitness region of the state space, increases thus improving exploitation. Moreover, number of soldier bees is determined based on the fitness of scout bees. Thus the best scout bees have more soldier bees for search. It result concentrate to better promising region of search space (exploitation) and decrease the computational time and increase the convergence speed. This is because the QAP-ABC increases the exploitation from information of meet points by using of positive feedback. It should also be noted that QAP-ABC has higher computational time compared with SOM.
6. Conclusion In this work we suggested a new approach based on Artificial Bee Colony and Quadric Assignment Problem named QAP-ABC for data visualization problem. Proposed method with other popular data visualization method, SOM, MDS and Sammon's Map, are executed on a real data set with same circumstances. The results were compared and it showed that QAP-ABC has high performance with compared others. References 1. L. Xu, Y. Xu, T. W. S. Chow, "PolSOM: A new method for multidimensional data visualization", Pattern Recognition, Volume 43, Issue 4, pp. 1668–1675, 2010. 2. Y. Xu, L. Xu, T. W. S. Chow, "PPoSOM: A new variant of PolSOM by using probabilistic assignment for multidimensional data visualization", Neurocomputing, Volume 74, Issue 11, pp. 2018–2027, 2011. 3. F. S. Tsai, "Dimensionality reduction techniques for blog visualization", Expert Systems with Applications, Volume 38, Issue, pp. 2766–2773, 2011. 4. R. Abbiw-Jackson, B. Golden, S. Raghavan, and E. Wasil, "A divide-and-conquer local search heuristic for data visualization", Computers and operations research, Volume 33, Issue 11, pp. 3070–3087, 2006. 5. M. H. Ghaseminezhad, A. Karami, "A novel self-organizing map (SOM) neural network for discrete groups of data clustering", Applied Soft Computing, Volume 11, Issue 4, pp. 3771–3778, 2011. 6. P. Klement, V. Snášel, "Using SOM in the performance monitoring of the emergency call-taking system", Simulation Modelling Practice and Theory, Volume 19, Issue 1, pp. 98–109, 2011. 7. J. W. Sammon, "A nonlinear mapping for data structure analysis", IEEE Transactions on computers, Volume 18, Issue 5, pp. 401–409, 1969. 8. J. Sun, C. Fyfe, M. Crowe, "Extending Sammon mapping with Bregman divergences", Information Sciences, Volume 187, pp. 72–92, 2012. 9. G. Gan, C. Ma, J. Wu, Data Clustering: Theory, Algorithms, and Applications, Society for Industrial and Applied Mathematics, 2007. 10. P. A. De Mazière, M. M. Van Hulle, "A clustering study of a 7000 EU document inventory using MDS and SOM", Expert Systems with Applications, Volume 38, Issue 7, pp. 8835–8849, 2011. 11. M. Ghanbari, "Visualization Overview", Thirty-Ninth Southeastern Symposium on System Theory, pp. 115–119, 2007. 12. S.-H. Bae, J. Qiu, G. Fox, "Adaptive interpolation of multidimensional scaling", Procedia Computer Science, Volume 9, pp. 393–402, 2012. 13. A. M. Lopes, J. A. Tenreiro Machado, C. M. A. Pinto, and A. M. S. F. Galhano, "Fractional dynamics and MDS visualization of earthquake phenomena", Computers & Mathematics with Applications, Volume 66, Issue 5, pp. 647–658, 2013. 14. T. Kohonen, "The self-organizing map", Neurocomputing, Volume 21, Number 1-3, pp. 1–6, 1998. 15. H. T. Jadhav, R. Roy, "Gbest guided artificial bee colony algorithm for environmental/economic dispatch considering wind power", Expert Systems with Applications, Volume 40, Issue 16, pp. 6385–6399, 2013. 16. A. P. Engelbrecht, Computational Intelligence: An Introduction, Wiley, 2007. 17. C. B. Kalayci, S. M. Gupta, "Artificial bee colony algorithm for solving sequence-dependent disassembly line balancing problem", Expert Systems with Applications, Volume 40, Issue 18, pp. 7231–7241, 2013. 18. H.-C. Tsai, "Integrating the artificial bee colony and bees algorithm to face constrained optimization problems", Information Sciences, Volume 258, pp. 80–93, 2013. 19. J.-Y. Park, S.-Y. Han, "Application of artificial bee colony algorithm to topology optimization for dynamic stiffness problems", Computers & Mathematics with Applications, Volume 66, Issue 10,, pp. 1879–1891, 2013. 20. M. Rani, H. Garg, S. P. Sharma, "Cost minimization of butter-oil processing plant using artificial bee colony technique", Mathematics and Computers in Simulation, Volume 97, pp. 94–107, 2014. 21. V. Golmah, J. Parvizian, "Visualization and the understanding of multidimensional data using Genetic Algorithms: Case study of load patterns of electricity customers", International Journal of Database Theory & Application, Volume 3, Issue 4, pp. 41–56, 2010. 22. J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankangas, S. Team, and L. Oy, "Som toolbox for matlab 5", Technical Report A57, Helsinki University of Technology, Neural Networks Research Centre, 2000. 23. A. Alizadegan, B. Asady, M. Ahmadpour, "Two modified versions of artificial bee colony algorithm", Applied Mathematics and Computation, Volume 225, pp. 601–609, 2013. |