Prediction of Length of ICU Stay Using Data-mining Techniques : an Example of Old Critically Ill Postoperative Gastric Cancer Patients

Gastric cancer is one of the most common cancers and a leading cause of cancer death in China and worldwide (Kamangar et al., 2006; Yang, 2006). With the background of aging population in China and advances in clinical medicine, the amount of operations on old patients increases correspondingly. This situation imposes increasing challenges to critical care medicine and geriatrics, due to the fact that more complications and high-risk factors may be prevalent among the old patients. Prediction for the length of stay in intensive care unit (ICU) has gained more attention in the field of critical care medicine. Severity scores, like the Acute Physiology and Chronic Health Evaluation (APACHE) II has been widely used to predict the length of ICU stay (Abbott et al., 1991; Suistomaa et al., 2002; Dossett et al., 2009). However, they may not be adequate predictors of length of stay, because APACHE II is based on data collected during the first 24h


Introduction
Gastric cancer is one of the most common cancers and a leading cause of cancer death in China and worldwide (Kamangar et al., 2006;Yang, 2006).With the background of aging population in China and advances in clinical medicine, the amount of operations on old patients increases correspondingly.This situation imposes increasing challenges to critical care medicine and geriatrics, due to the fact that more complications and high-risk factors may be prevalent among the old patients.
Prediction for the length of stay in intensive care unit (ICU) has gained more attention in the field of critical care medicine.Severity scores, like the Acute Physiology and Chronic Health Evaluation (APACHE) II has been widely used to predict the length of ICU stay (Abbott et al., 1991;Suistomaa et al., 2002;Dossett et al., 2009).However, they may not be adequate predictors of length of stay, because APACHE II is based on data collected during the first 24h of ICU treatment, and the most severely ill patients may die after a short length of stay and those who need only post-surgical observation are also transferred to general wards early.
Baseline information of the length of ICU stay can help to make appropriate decisions about the number of ICU beds, it can also result in better communications with patients and relatives because it is a key determinant of the costs of intensive care.Thus, the development of a framework to predicate the length of ICU stay is needed and will be useful in supporting clinical judgment in decision making for individual patients.
In January 2009, critical care medicine was just accredited as an independent subspecialty of clinical medicine by the Ministry of Health of the People's Republic of China.Critical care research in mainland China is still in its infancy (Du et al., 2010).The present study was designed to describe the information of the length of ICU stay from a single institution experience of old critically ill gastric cancer patients after surgery and the framework of incorporating data-mining techniques with the prediction of length of stay in ICU.

Study unit and data collection
The setting for this study was a 28-bed multidisciplinary adult intensive care unit in a 2250-bed medical university hospital in Shenyang, Liaoning province, northeastern China.For this hospital, annually clinic visits are around 2 million person-visits, and annually hospital admissions are around 80 thousand person-admissions.The intensive care unit accommodates approximately 800 admissions of critically ill patients per year.Almost all patients need fulltime monitoring and mechanical ventilation.This unit is a part of National Key Discipline of General Surgery, it also manages two provincial centers, one is Liaoning Provincial Center of Medical Quality Control and Improvement, the other is Liaoning Provincial Center of Disaster and Critical Care Medicine.
In this unit, concerned patient data has been gathered and stored in an ICU patient information system since and with enrichment from the beginning of 2010.A full-time data entry clerk is in charge of daily data collection, data entry and database maintenance.Thus, retrospective design was adopted to collect the consecutive data about gastric cancer patients 60 years of age or older after surgery recorded in the patient information system from January 1 st 2010 to March 31 st 2011.
Characteristics of patients and the length their ICU stay were gathered for analysis.Length of ICU stay was measured using the interval between the day of ICU admission and ICU discharge on a trisection scale of calendar days (morning, evening and night).

Data analysis Cox regression
Univariate Cox regression was performed first to examine the relationship between the potential candidate factors and the length of ICU stay (we treated it as time variable for survival analysis), then the impact of multiple covariates was assessed in the multivariate model.The above analyses were conducted by using the Statistical Product and Service Solutions (SPSS 12.0 for windows, SPSS Inc., Chicago, IL, USA).

Construction of regression tree
The regression tree was constructed to predict the length of ICU stay and explore the important indicators.The main purpose of regression tree was to produce a tree-structured prediction rule.The construction of classification and regression tree model has been well described in previous studies (Guan et al., 2008;Lapinsky et al., 2008;Berney et al., 2011).Major steps were as follows, (1) Binary partitioning.According to whether X∈A, where X was a variable and A was a constant, the response to the questions was yes or no, then the prediction space was partitioned into two parts.(2) Goodness-ofsplit criteria for choosing the best split.(3) Tree Pruning.Minimal cost-complexity pruning was carried out in this study.This method relied upon a complexity parameter, denoted α, which wais gradually increased during the pruning process.(4) Cross validation.The tree was computed from the learning sample, and its predictive accuracy was examined by applying it to predict the class membership in the test sample.Software R 2.3.1 (http://www.R-project.org) was adopted to structure the classification and regression trees.

Ethical review
The present study was approved by research institutional review board of The First Affiliated Hospital of China Medical University, in accordance with the standards of the Declaration of Helsinki.All the data collected were treated confidentially, the names and identity codes of patients were removed before data analysis.

Results
From January 1 st 2010 to March 31 st 2011, there were 979 consecutive admissions in the study unit.More than two thirds of the patients were admitted to ICU from the same hospital, mostly from operation room.Ninety-six old critically ill gastric cancer patients after surgery met the inclusion criteria, with a median age of 78 (60-87) years.Among them, 7 patients were from emergency room due to acute peritonitis or acute intestinal obstruction caused by primary gastric cancer.Median APACHE II was 8 (3-24), and 78.1% of patients were male.The median length of ICU stay was 3.3 days, with the range from 1.3 to 22.7 days (Table 1).
Univariate COX analysis indicated that APACHE II score, SOFA score, shock and necessary nutrition support   were related to the length of ICU stay in the enrolled patients.Multivariate Cox analysis found that shock and nutrition support need were the risk factors for prolonged length of ICU stay with statistical significance.
The multivariate regression tree model is shown in Figure 2. The importance of variables can be judged according to its location in the tree (the closer to the root, the more important) and the times of appearances (the more frequent appearances, the more important) in the model.In the present study, the root of regression tree was divided into two branches according to whether the patient was with the comorbidity of two or more kinds of shock (e.g., Cardiogenic shock, Septic shock, Hypovolemic shock, Anaphylactic shock, Neurogenic shock and etc.).Altogether, eight variables entered the regression model, including age, APACHE II score, SOFA

Discussion
In the context that the number of patients after surgery who require intensive care is increasing rapidly in China, there is growing concern about the length of ICU stay.The present study presents a framework for estimating the length of ICU stay for old critically ill gastric cancer patients after surgery.To the best of our knowledge, it is the first study focusing on the length of ICU stay related data-mining in the study area.
The length of ICU stay contributes a large proportion of financial burden during the whole stay in the hospital (Pittoni & Scatto, 2009;Beenen et al., 2011 ).The gap does exist between the patients families' optimistic expectation and the ICU clinicians' professional judgment and choice of treatment.This study provides the information about the distribution of the length of ICU stay and its related determinants, paying special attention to those elderly patients suffering from multiple co-morbidities.The median age was 78 years in the sample, which indicated that due to aging and the improvement of new technology and various kinds of co-morbidities, more and more old patients need intensive care.Detailed characteristic description of the ICU patients can help the hospital manager to know how to maximize the use of ICU resources and how to better allocate the resources in an optimistic way (Weissman, 1997;Stricker et al., 2003;Terra, 2007).
Univariate survival analysis indicated that APACHE II score, SOFA score, shock and necessary nutrition support were important indicators for the length of ICU stay, while it could not provide the threshold of APACHE II score, SOFA score.There is also no uniform definition for a prolonged ICU stay, the ICU needs to develop a suitable model to predict the length of ICU stay with the available information.The regression tree adopted in our study suggested that among the selected subgroup of ICU patients, it was possible for the ICU clinicians to identify some patients with potentially prolonged ICU stay with several simple variables.It is necessary to broad the collaborative efforts to extend from the patient care arena into the realms of education, research, and administration (Miccolo & Spanier, 1993).Carr DD has also indicated that the collaborative partnerships in critical care, can enhances the organizations' mission to deliver quality patient-centered care with the main focus on.(1) minimization of inpatient transitions, (2) reduction of cost by decreasing the length of stay, (3) promotion of patient and family satisfaction through efforts of advocacy, and (4) enhanced discharge planning (Carr, 2009).
The authors recognize that there are some limitations inherent in this retrospective analysis.First, the generalization was limited because our results were derived from the experience of one ICU in a single hospital, and therefore it is not representative of the region or country as a whole.We do not recommend different hospitals to use the same model to predict the length of ICU stay, we just want to alert the ICU clinicians to keep these candidate risk factors in mind and show them the potential framework of data-mining which might optimize the resource use.Second, ICU stay was measured using the calendar days (in the three parts of morning, afternoon, night), several studies have proved the superior accuracy of exact ICU stay in minutes compared to measurement using calendar days (Marik & Hedman, 2000;Render et al., 2005).Personal digital assistants (PDAs) (Faulk & Savitz, 2009;Zalon et al., 2010) are being introduced in the studied hospital, it would be available for the doctors and nurses to get the accurate real-time information about the length of ICU stay for future research.The third and also the most important limitation is the complexity of the available and unknown risk factors of the length of ICU stay (Rapoport et al., 2003).With the rapid improvement of the treatment and nursing, more details about the cooperation or interaction of the varying and emerging factors should be explored and thus models should be calibrated in parallel to provide more reference for the selection of candidate factors.
In conclusion, the comorbidity of two or more kinds of shock is the most important factor of length of ICU stay in the studied sample.There are differences of ICU patient characteristics between wards and hospitals, consideration to the data-mining technique should be given by the intensivists as a length of ICU stay prediction tool.

Figure
Figure 1.Schematic Illustration of the Proposed Study on the Length of ICU Stay for Old Critically Ill Gastric Cancer Patients after Surgery (Shenyang, China)

Table 2 . Univariate Cox Analysis of the Length of ICU Stay of Old Critically Ill Gastric Cancer Patients after Surgery
* The negative coefficient indicates the variable is a risk factor for prolonged length of ICU stay

Table 3 . Multivariate Cox Analysis of the Length of ICU Stay of Old Critically Ill Gastric Cancer Patients after Surgery
*The negative coefficient indicates the variable is a risk factor for prolonged length of ICU stay