Use of an Artificial Neural Network to Predict Risk Factors of Nosocomial Infection in Lung Cancer Patients

Statistical methods to analyze and predict the related risk factors of nosocomial infection in lung cancer patients are various, but the results are inconsistent. A total of 609 patients with lung cancer were enrolled to allow factor comparison using Student’s t-test or the Mann-Whitney test or the Chi-square test. Variables that were significantly related to the presence of nosocomial infection were selected as candidates for input into the final ANN model. The area under the receiver operating characteristic (ROC) curve (AUC) was used to evaluate the performance of the artificial neural network (ANN) model and logistic regression (LR) model. The prevalence of nosocomial infection from lung cancer in this entire study population was 20.1% (165/609), nosocomial infections occurring in sputum specimens (85.5%), followed by blood (6.73%), urine (6.0%) and pleural effusions (1.82%). It was shown that long term hospitalization (≥22days, P= 0.000), poor clinical stage (IIIb and IV stage, P=0.002), older age (≥61year old, P=0.023), and use the hormones were linked to nosocomial infection and the ANN model consisted of these four factors .The artificial neural network model with variables consisting of age, clinical stage, time of hospitalization, and use of hormones should be useful for predicting nosocomial infection in lung cancer cases.


Introduction
Lung cancer mortality is the highest in all tumors and its incidence is gradually increasing (Jemal et al., 2005;Malcolm et al., 2009;Elsayed et al., 2011). Despite surgical resection, chemotherapy and radiation therapy technology are continuously improving, patientswith lung cancer remains extremely vulnerable to relapse and fatal (Gridelli et al., 2003). The cure rate of lung cancer is very low and the average 5-year survival of patients with lung cancer is below 15 % ( Ogawa et al., 2008;Rachet et al., 2008;Chen et al., 2009;Stewart et al., 2010). The nosocomial infection rate of patients with lung cancer showed high trend (Kamboj et al., 2009). Nosocomial infection not only affected the treatment and rehabilitation, prolonged time of hospitalization, increased health care costs, but also significantly resulted in prognosis, even life-threatening (Bereket et al., 2012). To analyze the characteristics and risk factors of nosocomial infections in lung cancer patients, which will help to we adopt effective prevention and control of nosocomial infection for improving patient outcomes and prolonging survival in lung cancer patients.
As one of the clinical prediction rules (Simon et al., 2012), an artificial neural network (ANN) is composed of a series of interconnecting parallel nonlinear processing elements (nodes) with limited numbers of inputs and outputs (Hong et al., 2011). A systematic review suggested that ANN is potentially more successful than conventional statistical techniques at predicting clinical outcomes when the relationship between the variables that determine the prognosis is complex, multidimensional and non-linear (Bartosch-Harlid et al., 2008). The aim of this study was to develop an ANN to predict nosocomial infection in lung cancer.

Subjects
The 609 patients with lung cancer came from the First Affiliated Hospital of Wenzhou Medical University, China, from January April 2005 to January 2014. The above cases were confirmed by the histopathological results. The lung cancers consisted of 443 male and 166 women, aged between 32 to 88 years. The demographics characteristics of 609 cases lung cancer patients see Table 1. The criterion for the histopathologic diagnosis of lung cancer was the World Health Organization (WHO)/ International Study of Lung Cancer (IASLC) lung cancer histological classification standards. The following information was collected for each patient on admission: age, gender, clinical stage, histological classification, invasive procedures, mechanical ventilation, surgery, radiotherapy, chemotherapy, hemoglobin, serum albumin, white blood cell count, use of antibiotics, use of hormone, non-neoplastic lung disease, concurrent diabetes or renal insufficiency, smoking (smoking index= number of cigarettes smoked per day × smoking years), time of hospitalization. This study was approved by the Institutional Ethics Review Board of the First Affiliated Hospital of Wenzhou Medical University and all patients provided written informed consent to this study.

The culture and identification of bacterial and fungi
Specimens include 300 cases sputum, 121 cases blood, 121 urine sample and 67 cases pleural effusion. The bacterial are identified by BioMerieux automatic identification system (France, Meraux Corporation) and susceptibility testing is used by K-B method. The culture and identification of fungi were used with Kemaijia chromogenic medium (France, Meraux Corporation).

Diagnostic criteria of nosocomial infection
The nosocomial infection diagnostic criteria and incidence statistics with reference to the Chinese Ministry of Health nosocomial infection diagnostic criteria in 2001. Confirmed information according to clinical manifestations, laboratory tests and /or identification of bacteria or fungi, where the infection occurred at 48h after admission were classified as nosocomial infections; infections occur if the patient admission directly related last hospitalization, tied for nosocomial infection stained.

Statistical analysis
Continuous values were expressed as the means±SD or medians and compared using Student's t-test or the Mann-Whitney non-parametric test. Categorical values were described by counts and proportions and compared using the X 2 test. Variables that were significantly related to the presence of nosocomial infection were selected as candidates for input into the final ANN model. Sensitivity analysis (also known as independent variable importance analysis) was performed to determine the optimum variables for construction of the final ANN model (Hong et al., 2011). An exploratory three-layer multiplayer perceptron (MLP) ANN model with a back propagation algorithm was constructed for sensitivity analysis. The data were randomly divided into a training sample (487 cases, 80%) and a test sample (122 cases, 20%) in the exploratory ANN model. Sigmoid transfer functions were used in the hidden and output layers. Gradient descent was used to estimate the synaptic weights. The initial learning  rate was 0.4, and the momentum was 0.9. According to the results of the univariate and sensitivity Analyses, a final three-layer feed-forward ANN model with a back propagation algorithm was constructed for all 609 patients. The ANN model was trained with a maximum of 500 iterations and 10 tours. The overfit penalty was assigned as 0.001, and the convergence criterion was 0.00001 (Hong et al., 2011). Fivefold cross-validation was used (Sall et al., 2007). The output of the ANN model was transformed to range from 0-1. Nosocomial infection was predicted if the output was greater than or equal to 0.5 (Hong et al., 2011). The sensitivity, specificity, negative predictive value, positive predictive value and diagnostic accuracy of the ANN model are reported herein. The clinical or patient-relevant utility of a diagnostic test is evaluated by a Fagan plot. The Fagan plot allows the reader to estimate the post-test probability of the target condition in an individual patient based on a selected pretest probability (Whiting et al., 2008). Forward conditional step-wise logistic regression analysis was performed to develop a logistic regression function (LR) for comparison. The conditional probabilities for stepwise entry and removal of a factor were 0.05 and 0.06, respectively. The area under the receiver operating characteristic (ROC) curve (AUC) was used to evaluate the performance of the ANN model and LR model. Differences were considered statistically significant if the two-tailed P value was less than 0.05. SPSS 18.0 (SPSS Inc., Chicago, IL, USA) was used for ANN analysis.

The distribution of nosocomial infection in patients with lung cancer
The prevalence of nosocomial infection from lung cancer in this entire study population was 20.09% (165/609), nosocomial infections occur in sputum specimens (85.45%), followed by blood (6.73%), urine sample (6.00%) and pleural effusion (1.82%).

The pathogen distribution of nosocomial infections in patients with lung cancer
The 198 pathogens were isolated from 165 cases of nosocomial infection in patients with lung cancer, including 41 cases Gram-negative bacteria, accounting for 20.71%.71 cases Gram-positive bacteria, accounting for 35.86%. 86 cases fungi, accounting for 43. 43%.

Univariate and multivariate analysis
Eighteen variables considered relevant to the presence of nosocomial infections were tested using univariate and multivariate analyses. Multivariate analysis by logistic regression identified the following three independent variables as predictive of persistent nosocomial infections in lung cancer: age (p=0.028), clinical stage (p=0.006), and time of hospitalization (p=0.000). A logistic regression function (LR model) was developed to predict nosocomial infections in lung cancer as follows: -0.34+0.34 age (years) -1.01 clinical stage + 1.23time of hospitalization (days).

ANN analysis
As shown in Figure 1, time of hospitalization, clinical stage, age and use of hormone were the most important predictors of nosocomial infections by sensitivity analysis (the exploratory ANN model constructed for the sensitivity analysis is not shown). The final three-layer 5-5-1-feedforward back propagation ANN model with variables consisting of time of hospitalization, clinical stage, age and use of hormone was developed and trained in 609 patients ( Figure 2). The sensitivity, specificity, positive likelihood ratio, negative likelihood ratio of the ANN was 56.0%, 85.0%, 3.73, and 0.52, respectively. The ROC curves for the ANN model and LR model for predicting nosocomial Table 3.

Discussion
The results of this study demonstrate that time of hospitalization, clinical stage, age and use of hormone were the most important predictors of nosocomial infections. Based on ROC analysis, the diagnostic performance of the ANN model was superior to both the LR model.
Of the standard ANNs, the MLP is perhaps the most popular network architecture currently in use (Saftoiu et al., 2012). An MLP model consists of an input layer, a hidden layer and an output layer. All of the artificial neurons are arranged in a layered feed-forward topology. Our ANN model was developed using the SPSS neural networks program and JMP software, which can both run the MLP model (Sall et al., 2007;Hong et al., 2011). ANNs are nonlinear statistical data modeling tools. They can take into account outliers and nonlinear interactions among variables and can reveal previously unrecognized and/or weak relationships between given input variables and an outcome (Sall et al., 2007). Therefore, ANNs often include parameters that may not reach significance using conventional statistics, as evidenced by the fact that use of hormone included in our ANN model was not significant in logistic regression analysis.
We fatherly analyze the time of hospitalization, clinical stage, age and found long time of hospitalization (≥22days, p= 0.000), poor clinical stage (IIIb and IV stage, p= 0.002), older (≥61year old , p=0.023) were apt to nosocomial infection. Jiang Y et al. (2004) found that pulmonary fungal infection rate was 6.35% (78/1229). The major fungus was Candida albicans (68.18%). The main risk factors were age of > or =50 years (p<0.005), primary site (lung cancer, p<0.001), cancer stage (stage IV, p<0.005), pulmonary radiotherapy (p<0.001), chemotherapy (p<0.001), and long-term hospitalization (>2 weeks, p<0.005). Our results were mostly consistent to Jiang Y. Prolonged hospitalization increased the chances of opportunistic infections. Older were prone to infection for decreased immunity. Also The IIIb and IV stage of lung cancer had occurred distant metastasis or lymph node metastasis and their immune function is poor. In clinical practice, people often seek hormones as "antipyretics", resulting in further development of disease or concurrent bacterial and fungal infection on the basis of viral infection.
In conclusion, an artificial neural network model with variables consisting of time of hospitalization, clinical stage, age and use of hormone may be useful for predicting the nosocomial infection in patients with lung cancer.