Assessing Misdiagnosis of Relapse in Patients with Gastric Cancer in Iran Cancer Institute Based on a Hidden Markov Multi-state Model

Assessment of disease progression in chronic diseases such as gastric cancer is carries out by examining transition states of disease from intermediate to more advanced states and death. For an accurate assessment of disease progression it is necessary to have proper understanding of the natural disease process which is often hidden and unobservable. To achieve this objective, the status and severity of disease should be clear. But in many diseases including gastric cancer, measuring the accurate disease status of patients directly is not possible. Therefore, it is necessary that a wide range of diagnostic tools such as biomarkers and diagnostic tests be used (Kodera et al., 2003; Whiting et al., 2006). Accurate diagnosis in many diseases often requires costly and invasive procedure. But applying non-invasive diagnostic methods that provide a high diagnostic ability has a great value (Fauci, 2008). In the process of gastric cancer, the survival time of patients is recorded since surgery. After surgery, the


Introduction
Assessment of disease progression in chronic diseases such as gastric cancer is carries out by examining transition states of disease from intermediate to more advanced states and death.For an accurate assessment of disease progression it is necessary to have proper understanding of the natural disease process which is often hidden and unobservable.To achieve this objective, the status and severity of disease should be clear.But in many diseases including gastric cancer, measuring the accurate disease status of patients directly is not possible.Therefore, it is necessary that a wide range of diagnostic tools such as biomarkers and diagnostic tests be used (Kodera et al., 2003;Whiting et al., 2006).Accurate diagnosis in many diseases often requires costly and invasive procedure.But applying non-invasive diagnostic methods that provide a high diagnostic ability has a great value (Fauci, 2008).
In the process of gastric cancer, the survival time of patients is recorded since surgery.After surgery, the

Assessing Misdiagnosis of Relapse in Patients with Gastric
Cancer in Iran Cancer Institute Based on a Hidden Markov Multi-state Model Ali Zare, Mahmood Mahmoodi*, Kazem Mohammad, Hojjat Zeraati, Mostafa Hosseini, Kourosh Holakouie Naieni patient enters the study and is subjected to death hazard.In these studies death occurrence is regarded as the end point of the study and relapse is considered as intermediate event.The prompt diagnosis of relapse plays a critical role in the survival of these patients because in case of late diagnosis of relapse, metastasis to other organs especially lung and lymph nodes will deteriorate the disease status and will lead to death (Zare et al., 2013a;2013b;2014).But determining the exact time of relapse in patients with gastric cancer who have gone under surgery often fails to be investigated due to various reasons.The first reason is related to periodic follow-up of patients and the quality of recording their information and obtained data because such interval data of observing patient's status is not available and the exact time of relapse is not often recognizable.So it cannot be understood that at which time exactly the relapse has occurred.A patient may be studied just as a part of follow-up period because a doctor may pay less attention to patients who have better status or a patient may not call on the doctor when he feels better (Gruger et al., 1991;Titman, 2008;Jackson, 2011).
Some studies are also historical and are based on information obtained from hospital records.Maybe at first glance hospital records are a good resource to check the patients' status but the quality and accuracy of these records are not always acceptable.This weakness can be due to various reasons including patient's late callon doctor, chaos in hospital, poor quality of medical services, inaccurate recording of patient's information and deficiency in follow-up system.
To clarify this situation, a patient's status is considered in a hypothetical 3-state with an absorbed state of death (Figure 1).Vertical dotted (dashed) lines are the follow-up times and are studied in patients' 1.5-3.5-5and 9 months.Unfortunately, in the interval between 5 and 9 months, the patient's state changes from state 1 to state 3 which can never be observed.For such patients, it is assumed that the transition from state 1 to an absorbed state of death has taken place whereas in reality the transition is from state 3 to the absorbed state of death.
The second factor which makes it impossible to properly determine the relapse state is the natural variations in biomarkers over time (short-term fluctuations) and the error in their measurement process.A biomarker is a physical characteristic or a measurement of the patient which is used to determine patient's disease status.Blood pressure, pulse rate, and level of some specific blood cells are all examples of biomarkers.For monitoring and investigating the relapse in patients with gastric cancer undergone surgery, biomarkers (tumor marker) such as B-HCG, CA-125, CEA, CA19.9 and CA72.4 are used (Kodera et al., 2003;Whiting et al., 2006;Fan and Xiong, 2010;Li et al., 2010).In these tests, the amounts of aforementioned factors are measured in blood and when their amount is above normal in blood they can be the reason of the relapse of gastric cancer.The process of the disease under study becomes generally opaque and vague by the variations in biomarkers and errors in measurement process (Jackson, 2002;Inoue et al., 2008;Jackson, 2011).Figure 2 clearly illustrates this fact.The solid lines in this figure represent the actual process of disease which is often hidden and the broken lines indicate our observations of the disease process based on biomarkers with short-term fluctuations in long periods.It is evident that a mismatch between the two curves will lead to error in the diagnosis of disease state or the very classification error.
The third factor which leads to the incorrect diagnosis of disease states can be summarized in sensitivity of tests and diagnostic tools.For every disease, a set of diagnostic tools is used to monitor and investigate patients' status and to determine states of disease and patients during the study.Each of these tools has its own sensitivity in determining the existence or non-existence of disease or in representing the exact disease stage.Prompt diagnosis of relapse plays a key role in the survival of patients with gastric cancer who have undergone surgery.To increase chances of survival, these patients need to be studied for years in order to be checked for relapse and disease severity.A set of diagnostic tools are used for this purpose which can include clinical signs and physical examination of the patient, C.T Scan, Laparoscopy, Radiography, Endoscopic ultrasound, Endoscopy, and Biopsy (biopsy of gastric mucosa) as well as a series of specific tests based on biomarkers including B-HCG, CA-125, CEA, CA19.9 and CA72.4 tests (Marrelli et al., 2001;Takahashi et al., 2003;Layke and Lopez, 2004;Whiting et al., 2006;Fauci, 2008;Li et al., 2010).For CA19.9 test, in its best state, a sensitivity of 77.1% and for the combination of CEA, CA19.9 and CA72.4 a sensitivity of 62% have been reported (JAFARY, 2006;Qiu et al., 2008;Dilege et al., 2009;Li et al., 2010;Nicolini et al., 2010).Other studies have also reported a sensitivity of 35.2% for CA72.4 and a sensitivity between 40-60% for a binary combination of CEA, CA19.9 and CA72.4 for a relapse (Zheng et al., 2001;Li et al., 2010).Moreover, for the combination of CA-125, CEA and CA19.9 a sensitivity of 69.1% has been reported (Ni et al., 2005;Ucar et al., 2008;Li et al., 2010).Asking for multiple tumor markers can increase the sensitivity in detecting the relapse of gastric cancer, but a combination of them will also show a sensitivity of 60-80% in its best state (Ogata et al., 2008;Qiu et al., 2008;Dilege et al., 2009;Kobayashi et al., 2009;Fan and Xiong, 2010).Except endoscopy with biopsy, all other abovementioned diagnostic tests will show a sensitivity of   (Sleisenger and Fordtran, 1993;Derakhshan et al., 2004;Sadighi et al., 2005;Whiting et al., 2006;Haj-sheykholeslami et al., 2008;Li et al., 2010).In case of doing endoscopy with biopsy a sensitivity of 90-95 % is expected (Sleisenger abd Fordtran, 1993;Hajsheykholeslami et al., 2008;Peery et al., 2012).It should be noted here that determining relapse is not the major issue but the prompt diagnosis of a relapse.According to the three aforementioned cases, a late relapse is likely to be determined.So whenever we study the state of the patient for a relapse, there is always the possibility of error.
Studying cancers such as gastric cancer which have a high rate of mortality requires applying models which provide the possibility of a closer analysis of disease process and therefore provide researchers with more detailed and accurate data.Various models have been designed in this field.One of the most common models in this area is Markov multi-state model.This model is used to study patients' status and factors affecting events which occur in patients during the study.In these models, patients go through different states as they pass their life states to death.The time to reach each state and effective factors in each state both play a significant role in patients' survival (Zare et al., 2013b).In Markov multi-state models the assumption is that the observed states are without error and are consistent with the actual state of the patient at the desired time (Putter et al., 2007;Zare et al., 2013b;Zare et al., 2014).The actual state, as previously mentioned, is hidden and unobservable; therefore, natural disease process is observed as a series of states which are subjected to error (Jackson and Sharples, 2001;Jackson, 2002;Jackson et al., 2003).Error in determining the patient's state can be due to a variety of reasons including inaccurate recording of patient's information, deficiency in followup system, variations and fluctuations in biomarkers, and sensitivity of diagnostic tools and tests.So ignoring this error in the model causes bias in estimates which results in incorrect inferences (Jackson and Sharples, 2001;2002;2003;Titman, 2008).Statistical models which are only based on patients' observations are not appropriate models in this area.In this condition only a model is logical and acceptable which studies the process of disease which is hidden and also considers the possibility of error in diagnosis of disease states.To this end, the researchers used a new class of stochastic processes called hidden Markov multi-state model which has been designed to generalize the standard multi-state models.These models will be able to estimate both transition rates among states and misclassification probabilities of patients' states simultaneously.These models also have the capabilities to determine the effect of covariates on transition rates between states and on misclassification probabilities of patients' states as well.In this study, therefore, applying hidden Markov model, in addition to classification error between alive states without a relapse (state 1) and with a relapse (state 2) which is construed as misdiagnosis of relapse, factors affecting transition rates among different states of a multi-state model as well as the effect of affecting factors on misclassification probabilities between these two states were analyzed.

Patients
In this study, 330 patients with gastric cancer with the following data were studied: i) the patients had been hospitalized and had undergone surgery from 1995 to 1999 in surgical wards of Cancer Institute of Iran; ii) they had records in the archives of the hospital, and in their files their addresses and phone numbers were available for subsequent follow-up.The survival time of patients was determined after surgery and those patients who were still alive at the end of study period or the ones whose data were not available after a specific time-period were considered right censored.Moreover, demographic variables such as age (at the time of surgery), sex, and smoking history; clinical data of the disease including tumor location (Cardia-Anterior-other), type of pathology (Adenocarcinoma-other), disease stage (I-II-III-IV), location of metastasis (lymph nodes-liver-other), the type and extent of gastrectomy (T.G-S.G-D.G-PT.G-PX.G); and post-surgical and medical variables including number of renewed treatments (chemotherapy-radiotherapysurgery or a combination of them) were studied to model transition rates and misdiagnosis of relapse.

Statistical analysis
In most diseases the actual state of the disease is hidden and unobservable and the natural disease process is studied by a series of observed states which are subjected to error.In effect, reasons such as inaccurate recording of patient's information, deficiency in follow-up system, variations and fluctuations in biomarkers, and sensitivity of diagnostic tools and tests cause errors to exist in disease diagnosis process and determining patients' states during the study.So Markov multi-state models which are only based on patients' observations are not appropriate models in this area.In such conditions only a model is logical and acceptable which studies the process of disease which is hidden and also considers the possibility of error in diagnosis of disease states.Among the appropriate models which have been designed for this purpose is hidden Markov multi-state model.In this model, the process states are not directly observed and the states observed by some probability distributions (which are interpreted as error probability) are conditionally generated on hidden states.states of patients in sampling times t 1 ,.., t n have been specified as O 1 ,.., O n and S 1 ,..,S n respectively.The actual states of patients which are hidden in each time are considered by observed and subjected-to-error states.If the sequence of states we observe is the very actual and hidden states, there will not be any need to take error into consideration and standard multi-state models with Markov assumption will be suitable for data analysis.But due to reasons mentioned earlier, it is evident that it cannot be expected what we observe is error-free.Diagnosis can be along with error and at that desired time the state of the patient is diagnosed inaccurately.So it is necessary to take error into account in this state, so we need to apply hidden Markov process for modeling.
The main issue in hidden Markov model is to identify the sequence of actual states S 1 ,..,S n for each patient.For this purpose it is necessary to investigate the mechanism and the performance of this method.To fit a hidden Markov model two basic steps are required.The first step is to answer this question: in regard to sequence of observations O 1 ,.., O n among the existing hidden Markov models how to select the best model so that it generates O 1 ,.., O n with the highest probability of sequence observations.The second step is to find a sequence of actual states (based on a hidden Markov model selected in the previous step) which generates O 1 ,.., O n with the highest probability of sequence observations.In other words, the most probable sequence of actual (hidden) states be generated as S 1 ,..,S n for the observed states O 1 ,.., O n .For this purpose, a method called Baum-Welsh algorithm is used which aims to specify the best hidden Markov model and the most probable sequence among hidden states based on the sequence of observed states O 1 ,.., O n (Jackson, 2011).
Accordingly, in this study a hidden Markov multi-state model with three states of patient's being alive without a relapse (state1), with a relapse (state2) and death (state3) and three transitions of death hazard without a relapse (state1"state3), relapse hazard (state1"state2) and death hazard with a relapse (state2"state3) was considered for patients during the study.Furthermore, in this model the error probability was considered for the two states of patient's being alive without a relapse (state1), and patient's being alive with a relapse (state2) which are construed as misdiagnosis of relapse.This model is schematically shown in Figure 4. Based on this model, if the disease state which is specified by diagnostic tools and systems matches the actual state of disease which is hidden, there will not be any error in determining patients' states.In other words, in determining relapse in patients with gastric cancer no error has occurred.But with regard to what was mentioned earlier, there is error probability in determining patients' states and it is necessary to consider the probability of such errors in study.So if the observed state of disease mismatches the actual (hidden) state, misdiagnosis has occurred.This error can occur in misclassification of patients in their state of being alive without a relapse (e 21 ), or with a relapse (e 12 ).
In this study to estimate the classification error between the states of patient's being alive without a relapse (state1) and with a relapse (state2) as well as transition rates of death hazard without a relapse (state1"state3), relapse hazard (state1"state2) and death hazard with a relapse (state2"state3), Baum-Welsh algorithm and msm 1.0.1 software package were used (Jackson, 2011).Moreover, to model the simultaneous effect of demographic, clinical and medical, and post-surgical variables on transition rates among states and the probability of misdiagnosis of relapse, Cox proportional hazard model and Logistic regression model were used.Significance level of 5% was considered and data analysis was performed using R 2.15.1 software.

Results
The mean and median age of patients at diagnosis time were 65.61±11 and 68 years (range: 32 to 96 years).The mean of age diagnosis was 65.7±11.22years for men and 65.41±10.56years for women.239 patients (72.4%) died by the end of the study and the rest were censored.The survival mean and median of these patients were 24.86±23.73and 16.33 months, respectively.The patients' one-year, three-year, and five-year survival rates were 66%, 31%, 21.6%, consecutively.Analysis revealed that 228 patients were male (69.1%) and 100 (30.3%) had a history of smoking.Analyses also showed that 43 patients (13.03%) had a relapse, and in 43.9% of patients Cardia and in 19.1% of them Anterior was involved.In the pathology of 85.2% of patients Adenocarcinoma and for the rest of patients other pathologies (squamous cell carcinoma, small cell carcinoma, carcinoid tumor, carcinoma, malignant lymphoma, stromal tumor, spindle tumor) have been reported.192 patients (58.2%) had metastasis out of which 66.67% suffered from lymph nodes metastasis only.52.42% of patients had undergone Total Gastrectomy, 27.27% had undergone Subtotal Gastrectomy, 3.03% had undergone Distal Gastrectomy, 8.79% had undergone Partial Gastrectomy and 8.48% had undergone Proximal Gastrectomy.The analysis of disease stage revealed that 6.67% of patients were in stage I, 18.18% in stage II, 16.36% in stage III and 58.79% in stage IV.20.3% of patients had not received any renewed treatments whereas 26.06% of the patients had received three renewed treatments.
Based on hidden Markov model, the estimates of transition rates in death hazard without a relapse (state1"state3), relapse hazard (state1"state2) and death hazard with a relapse (state2"state3) were 0.01(95% CI: 0.00-0.02);0.02 (95% CI: 0.01-0.04);0.22 (95% CI: 0.16 -0.33) respectively.Moreover, the analysis of the effect of different variables on these transition rates in Table 1 also showed that variables of patient's age at diagnosis time and distance metastasis were factors affecting the occurrence of relapse.Based on these results, distance metastasis increases the hazard of relapse by 2.56 (95% CI: 1.13-5.79)times and with each year increase in patients' age, the hazard of relapse rises by 1.04 (95% CI: 1.02-1.07)times.Furthermore, male (sex), smoking history, receiving renewed treatments, lymph nodes metastasis, liver metastasis, as well as Adenocarcinoma pathology, location of Cardia involvement, surgeries (PX.G), and disease stage (II-III-IV), were not statistically significant but were among the factors which would increase the hazard of relapse.Regarding factors affecting the hazard of transition from state 1 to state 3 (death hazard without a relapse), only variables of number of renewed treatments and type and extent of gastrectomy were significant.These analyses showed that the increase in the number of renewed treatments minimized the hazard of patients' death to 0.23 (95% CI: 0.12-0.45)times.The analysis of type and extent of surgery also showed that only S.G surgery maximizes death hazard to 3.36 (95% CI: 1.45-7.77)times in patients and other surgeries (D.G-PT.G-PX.G) reduce the probability of patients' death.Although many of the factors were not statistically significant in this transition, the analyses showed that female (sex), advanced age at the time of diagnosis, Adenocarcinoma pathology, lymph node metastasis, and distance and liver metastases were among factors maximizing death hazard in patients without a relapse.The analysis of the effect of different variables on transition hazard from state 2 to state 3 (death hazard with a relapse) also revealed that only the variables of number of renewed treatments and type and extent of surgery were significant.These analyses indicated that with an increase in the number of renewed treatments, death hazard decreases by 0.61 (95% CI: 0.39-0.97)times in patients.Among surgeries, only S.G, PT.G and PX.G reduce death hazard 0.37 (95% CI: 0.17-0.82),0.91 (95% CI: 0.23-3.58),and 0.29 (95% CI: 0.11-0.76)times respectively.Whereas only (D.G) surgery had a cumulative effect on death hazard.In this transition, variables of female (sex), Adenocarcinoma pathology, advanced stages of disease, lymph nodes, liver, and distance metastases, and location of Cardia involvement were among the factors increasing death hazard in patients with a relapse.None of the above factors, of course, were statistically significant in the present study.
The hidden Markov model also showed that the classification errors in patients who were in alive state without a relapse (e 12 ), and with a relapse (e 21 ) were 0.22 (95% CI: 0.04-0.63)and 0.02 (95% CI: 0.00-0.09)respectively.In other words, 22% of patients who were

Discussion
In most studies on chronic diseases such as gastric cancer due to various reasons like inaccurate recording of patient's information and deficiency in follow-up system, variations and fluctuations in biomarkers, and sensitivity diagnostic tools and tests, the actual state of the disease is hidden and unobservable.So the natural disease process is considered only by a series of observed states which are subjected to error.In such circumstances, the use of models such as Markov multi-state models which are based on observed states in patients and have not taken classification error into account will cause bias in model estimates leading to incorrect inferences (Jackson, 2002;Titman, 2008;Titman and Sharples, 2010).Therefore, only a model is logical and acceptable which studies the process of disease which is hidden and also considers the possibility of error in diagnosis of disease states.One of the models which have been designed to overcome these problems is hidden Markov multi-state model.The complexity of hidden Markov multi-state model has caused this model to be in disuse in medical and gastric-cancer research.But this model has advantages over Markov multi-state models which are considered as the standard model.The first advantage is to consider classification error between different states of patients during the study.Not only does it estimate classification error between patients' states, it provides the possibility of studying various factors on classification error.The second advantage of hidden Markov multi-state models is that they show a better fitness to data.So they provide the researchers with better understanding of disease process which is hidden and unobservable.They also help accurately identify factors affecting various events occurring in patients during the study.
Accordingly, in this study a hidden Markov multi-state model with three states of patient's being alive without a relapse (state1), with a relapse (state2) and death (state3) and three transitions of death hazard without a relapse (state1"state3), relapse hazard (state1"state2) and death hazard with a relapse (state2"state3) was considered.Error probability between 'being alive without a relapse' (state1) and 'being alive with a relapse' (state2) was also regarded.In the present study, results of hidden Markov model showed that classification errors in patients' beingalive state without a relapse (e 21 ) and with a relapse (e 12 ) were 22% and 2%, respectively.In other words, 22% of patients who were considered in a without-a-relapse state were in fact in a with-a-relapse state.In the same vein, 2% of patients who were regarded to be in a with-a-relapse state, were classified actually in a without-a-relapse state.With regard to the high rate of classification error in 'being alive state without a relapse' (e 21 ) in patients with gastric cancer who have undergone surgery, it will not be surprising that in this study only 43 patients (13.03%) had a relapse.Unfortunately, other studies have not reported any scientific or clinical interpretation on low number of patients with a relapse (Zeraati et al., 2005a;2005c;2006;Zare et al., 2013b).Failure to consider classification error in patients between 'being alive without a relapse' (state1), 'with a relapse' (state2) states in studies conducted based on Markov multi-state models has caused estimates of 'death hazard without a relapse' (state1"state3) and 'death hazard with a relapse' (state2"state3) to be biased because failure to diagnose the relapse will cause death hazard without a relapse (state1"state3) and death hazard with a relapse (state2"state3) to be estimated more and less than their actual amount, respectively (Zeraati et al., 2005b;Zare et al., 2013b).
In addition, results of hidden Markov multi-state model showed that only patient's age at diagnosis and number of renewed treatments were among factors affecting the probability of classification error or misdiagnosis of relapse.Therefore, it is necessary that doctors and clinical experts use diagnostic tools with higher sensitivity, recording and follow-up system with more regular and shorter intervals, and apply more tumor markers in patients' blood tests for correct and prompt diagnosis for patients with gastric cancer undergone surgery who are older at diagnosis and also need higher number of renewed treatments (being at more advanced levels of disease).Also assessing factors affecting death hazard without a relapse (state1"state3), relapse hazard (state1"state2) and death hazard with a relapse (state2"state3) based on hidden Markov multi-state model showed that patient's age at diagnosis and distance metastasis were among factors affecting relapse occurrence.Variables of number of renewed treatments, and type and extent of surgery had a significant effect on death hazard without a relapse and death hazard with a relapse.These findings are consistent with most studies carried out in this field (Zeraati et al., 2005b;Zare et al., 2013b;2013a).
In spite of all complexities of hidden Markov multistate model and the fact that this model cannot be fitted in standard statistical softwares, this model due to its being rich in mathematical structure has the possibility of estimating classification error between different states of patients in addition to having all characteristics of Markov multi-state model.Moreover, based on this model factors affecting the probability of this error can be identified and researchers can be helped with the mechanism of classification error.To reach more accurate diagnosis in patients who are subjected to this error, diagnostic tools with higher sensitivity, recording and follow-up system with more regular and shorter intervals, and applying more tumor markers in patients' blood tests for correct and prompt diagnosis should be used.

Figure 1 .
Figure 1.Determining disease States Based on Recording System and Periodic Follow-up of Patients in a Hypothetical 3-State Disease Model with an Absorbed State of Death Figure 3.A Hidden Markov Model in Continuous Time.Observed States are Generated Conditionally on Hidden States

Figure 4 .
Figure 4. Hidden Markov Multi-state Model with Three Transitions of Death Hazard without a Relapse (state1"state3), Relapse Hazard (state1"state2), Death Hazard with a Relapse (state2"state3) and Misclassification of Patients in Their State of being Alive without a Relapse (e 21 ) or with a Relapse (e 12 ) for Patients During the Study in the Above Model

Table 1 . The Effect of Different Variables on Relapse Hazard, Death Hazard Without A Relapse and Death Hazard with A Relapse Based on the Hidden Markov Models (Hazard Ratio with 95% Confidence Interval) a
*Squamous cell carcinoma; (SCC), small-cell carcinoma, carcinoid tumor, spindle cell tumor, sarcoma, malignant lymphoma; **Diaphragm, spleen, pancreas, lungs, bone; a First category is considered as a reference group