A Novel Model for Smart Breast Cancer Detection in Thermogram Images

Accuracy in feature extraction is an important factor in image classification and retrieval. In this paper, a breast tissue density classification and image retrieval model is introduced for breast cancer detection based on thermographic images. The new method of thermographic image analysis for automated detection of high tumor risk areas, based on two-directional two-dimensional principal component analysis technique for feature extraction, and a support vector machine for thermographic image retrievalwas tested on 400 images. The sensitivity and specificity of the model are 100% and 98%, respectively.


Introduction
CBreast density has been shown to be related to the risk of developing breast cancer since women with dense breast tissue can hide lesions, causing cancer to be detected at later stages (Wolfe, 1976). Breast cancer, among women, is the second-most common cancer and the leading cause of cancer death. It has become a major health issue in the world over the past decades and its incidence has increased in recent years mostly due to increased awareness of the importance of screening and population ageing (Boquete et al., 2012). Currently, mammography is the dominant method for detection of breast cancer. However, it is still far from being perfect. The high sensitivity of screening mammography is compromised by its low specificity to benign lesions, which often appear mammographically similar to malignant lesions (Wei et al., 2009). Although breast cancer incidence has increased over the past decade, breast cancer mortality has declined among women of all ages (Bray et al., 2004;Sickles, 1997). This favorable trend in mortality reduction may relate to improvements made in breast cancer treatment (Buseman et al., 2003) and the widespread adoption of early detection technology. Early detection is crucial in the effective treatment of breast cancer. Current mammogram screening may turn up many tiny abnormalities that are either not cancerous or are slow-growing cancers that would never progress to the point of killing a woman and might never even become known to her (Thomas et al., 2013). Normally surgery, radiation therapy and chemotherapy are used to treat breast cancer. Radiations cause damage to DNA strand inside the cancer cells, which inhibits its further growth (Wafa et al., 2014). Radiations can also damage the Iman Abaspur Kazerouni*, Hossein Ghayoumi Zadeh, Javad Haddadnia healthy tissues, but the effect is more on cancerous cells, as the growth of cancerous cells is very rapid and they cannot repair any damage easily (Wafa et al., 2014). The high sensitivity for cancer detection allows it to be used in evaluating several aspects of breast cancer diagnosis and treatment (Sawsan, 2014).
Three of these respective technologies include digital infrared thermal imaging (DITI), electrical impedance scanning (EIS) and elastography. While these devices all use non-invasive imaging methods, which neither emit ionizing radiation nor compress the breast, they do operate under differing physiological principles (Thomas et al., 2013). DITI aims to detect localized skin temperature increases, which are thought to occur as a result of increased vascularisation, vasodilatation and recruitment of inflammatory cells to the site of a developing tumor (Wei et al., 2009). Localized differences in skin temperature are captured by infrared cameras, which produce a heat map of the breast called a thermogram.
In this paper, a novel model for classification of breast tissue and the breast image retrieval is proposed, in which the principal component analysis (PCA) and the two-dimensional principal component analysis (2D PCA) and the two-directional two-dimensional principal component analysis ((2D)2PCA) hve been used for feature extraction and dimension reduction of the thermographic images. After this step, retrieval is performed with the aid of a support vector machine (SVM) that is able to solve a variety of problems. This paper is organized as follows: Section (2) presents a compact expression for (2D) 2 PCA technique for feature extraction. In section (3), the support vector machine (SVM) with RBF kernel for image retrieval is described. Sections (4) present results of the proposed model and summary, respectively.

Feature Extraction
Images can be numerically represented by a feature vector, preferentially at a low-dimensional space in which the most relevant visual aspects are emphasized. In order to describe the different patterns of parenchyma tissue within one category, the texture attribute can be used. However, the high dimensionality of a feature vector that represents texture attributes limits its computational efficiency, so it is desirable to choose a technique that combines the representation of the texture with the reduction of dimensionality, in a way to turn the retrieval algorithm more effective and computationally treatable. Principal component analysis (PCA) is a useful technique for feature extraction in the various applications as image processing, pattern recognition and, etc (Zhang and Zhou, 2005). In the PCA technique, a 2-dimensional image matrix is transformed to 1-dimensional vector's row or column. When the image matrix is very large, it is difficult to evaluate the covariance matrix accurately due to its large size and the relatively small number of training samples. The 2-dimensional principal component analysis (2DPCA) has been already proposed to solve this problem. In this new technique, the eigenvectors are calculated directly without the matrix to vector transmission. The size of the image covariance matrix is equal to image wide size in the 2DPCA model, and the size of the covariance matrix is smaller than PCA model. Dislike the standard principal component analysis (PCA) which is based on one-dimensional vector, the 2DPCA technique is based on two-dimensional matrices. The experimental results in many reports proved that the accuracy of 2DPCA is often better than PCA (Zhang and Zhou, 2005). If 2DPCA is applied in the rows of image matrices once and then it is worked in the columns of image matrices, the results of principal component extraction will be accurate. This technique is called 2-directional 2-dimensional principal component analysis ((2D) 2 PCA). In this model; a 2DPCA is essentially used in the row direction of images, and then an alternative 2DPCA is worked in the column direction of images.
In (2D) 2 PCA technique, the size reduction is performed in the image rows and columns simultaneity. Assume A is an image matrix with m row and n column and X R n5d is a conversion matrix with orthonormal columns where n≥d and therefore, Y=AX is an m by d matrix.
Now let M as training samples with m by n matrices, which are shown by A k (k=1, 2,…, M) and A _ and C as average matrix and covariance matrix respectively. Thus

A _
can be written as: and C can be obtained by: Now an optimal conversion matrix can be obtained by calculation of the covariance matrix eigenvectors and selection of d eigenvectors. The eigenvectors can be found by d first eigenvalues, which are arranged from high to low values. The optimal conversion matrix is shown as X which the columns of this matrix are formed by the selected eigenvectors. It is to be noted that the size of C matrix is n5n and thus the computation of eigenvalues and eigenvectors will be fast and very optimal.
The further size reduction can be also yielded using another 2DPCA on image matrix. Thus, the covariance matrix C can be defined as: Where A k (i) and A _ (i) denote the i-th row vectors of A k and A _ respectively. Equation (3) is a 2DPCA operator in the row direction of image and shows that when the average of images matrix be zero, the covariance matrix can be derived by multiplication of row vectors of images. Now another 2DPCA can be worked in the column direction of image, and Equation (3) can be rewritten as: Where A k (j) and A _ (j) denote the j-th column vectors of A k and A _ respectively. In this step, q eigenvectors corresponding to q first high eigenvalues of matrix C can be obtained, and these eigenvectors are located as columns in the matrix Z which, Z R m5q . Projecting the random matrix A onto Z yields a q by n matrix Y=Z T A and also projecting the matrix A onto Z and X yields a q by d matrix Y=Z T AX.
The matrix C is used as the feature matrix in the proposed model.

Image Retrieval
After (2D) 2 PCA and features extraction, the obtained data must be classified. The proposed technique for the classification in the proposed model is the support vector machine (SVM) technique. Support vector machines (SVM) are a popular machine learning method for classification, regression, and other learning tasks. In data classification and analysis for the linear samples, SVM can select two parallel lines to separate the data with possible high accuracy. Generally speaking, when these lines cannot classify data, SVM uses a nonlinear conversion for transform data vector x to a higher-dimensional space. SVM can be used without any kernel but in the proposed model a SVM with the kernel function is used for the over-fitting error reduction.

Subject to y i (w T j(x i )+b≥1-h i
Where function j transforms training vectors x i to higher-dimensional space. L>0 is the parameter of the error term. The kernel function is denoted by K(x i , y j ):j(x i ) T j(x j ). In the proposed model, the radial basis function (RBF) has been used for classification: Where λ is the kernel parameter. The RBF kernel parameters L and λ have significant effects on accuracy of the classification process. The selection of these parameters can reduce the error percentage. The v-fold Asian Pacific Journal of Cancer Prevention, Vol 15, 2014 10575 DOI:http://dx.doi.org/10.7314/APJCP.2014.15.24.10573 A Novel Model for Smart Breast Cancer Detection in Thermogram Images cross-validation has been used for the parameters selection. In this approach, data is divided into v equal size subsets. Sequentially, one subset is tested using the classifier trained on the remaining v-1 subsets. Thus, each instance of the whole training set is predicted once so the cross-validation accuracy means the percentage of correctly classified data. The v-fold cross-validation can also eliminate the over-fitting error. For image retrieval, the various parameters values were tested and the results were analyzed to select the best kernel parameters. The best values in the proposed model are 2 3.54 and 2 -2.1 for C and λ respectively for v=3.

Results
The proposed model has been applied to 400 thermographic images that captured and collected by Hakim Sabzevari Medical Imaging Group in the Vasei Hospital. The proposed model has been implemented using MatLab (Matrix Laboratory) and the LIBSVM library (Chang and Lin, 2001).
The feature extraction has been performed using PCA, 2DPCA and (2D) 2 PCA techniques. Several feature matrices have been selected and compared to previous literature. Table 1 shows the results of average precision for the selected d first eigenvalues for the proposed model and PCA and 2DPCA. The average precision values for the eigenvalues based on (2D) 2 PCA is almost higher than other three techniques. Using (2D) 2 PCA technique, proposed model gives proper results while uses a few principal components. The first eight eigenvalues of the covariance matrix was used as features matrix because it can be observed from Table 1 that the highest value was 99.32% of average precision for the first eight principal components of (2D) 2 CPA. Table 2 shows a comparison among valid reported literature for a general comparison with other techniques. It can be inferred that the proposed model has higher correct classification percentage. Figure 1 shows a query image from a breast with cancer and retrieved images based on the proposed model. All the retrieved images are similar to the query image and all of them are cancerous breasts.

Discussion
In this paper, a breast tissue density classification and image retrieval model has been studied. We present a model for the data reduction based on two-Directional twodimensional principal component analysis ((2D) 2 PCA) technique and the extracted features and data is used in a support vector machine (SVM) with the radial basis function (RBF) for classification and thermographic image retrieval. The 3-fold cross-validation has been used for the parameters selection in SVM to avoid the over-fitting error in the data classification. The various parameters values were tested and the results were analyzed to select the best kernel parameters. The sensitivity and specificity of the model are 100% and 98%, respectively.

Figure 1. (A) An Example Query Image from Cancerous Breast Category and (B) the Retrieved Images Based on the Proposed Model
A B