Assessing Markov and Time Homogeneity Assumptions in Multi-state Models : Application in Patients with Gastric Cancer Undergoing Surgery in the Iran Cancer Institute

Gastric cancer is one of the most common causes of cancer deaths all over the world. Every year, more than 930 thousand new cases are reported throughout the world and more than 700 thousand people die from this type of cancer (Parkin et al., 2005). According to the latest statistics of Iran Cancer Research Center, gastric cancer is the most common cancer among Iranian men and the third most common cancer among Iranian women after breast cancer (Mohagheghi, 2004; Mohagheghi et al., 2009; Mousavi et al., 2009; Razavi et al., 2009). One of the most


Introduction
Gastric cancer is one of the most common causes of cancer deaths all over the world.Every year, more than 930 thousand new cases are reported throughout the world and more than 700 thousand people die from this type of cancer (Parkin et al., 2005).According to the latest statistics of Iran Cancer Research Center, gastric cancer is the most common cancer among Iranian men and the third most common cancer among Iranian women after breast cancer (Mohagheghi, 2004;Mohagheghi et al., 2009;Mousavi et al., 2009;Razavi et al., 2009).One of the most and prompt treatment for the patients with gastric cancer is the survival rate increase especially the 5-year survival rate.Unfortunately, more than 80% of patients with gastric cancer are diagnosed at a stage when common treatments such as gastrectomy, chemotherapy, or radiation therapy are not effective in increasing the patients' survival (Sadighi et al., 2005;Samadi et al., 2007;Sadighi et al., 2008;Association, 2011).For this reason, the 5-year survival rate is low in patients with gastric cancer after surgery.The increase in these patients' survival after surgery requires using models which could provide a closer examination on the behavior of variables so that it will better describe the natural process of the disease and will provide the researchers with more accurate data.
A group of statistical models designed to accomplish this purpose is the multi-state model.According to this model, patients experience different states (save for death event) during the study from the beginning to death event.The time reaching each state and factors affecting its occurrence play a fundamental role in patients' survival.Considering these states (often called intermediate events) has developed a novel approach in survival studies, because the natural process of disease in such cases can be considered as a stochastic process in which patients can be placed in various states throughout the study.Generally, a process in which people are transmitted in a continuous time among a discrete set of states.
Standard models of survival are the simplest of multi-state model.In these models, the patient is "Alive" at the beginning of the study and then his/her state may change to "Death".This is the only transition which is considered for patients during the study.This transition can be illustrated (Figure 1).Models more complex than two-state, provide the patients with the probability of more transitions during the study.These models are used when the initial state of the patient, i.e. "being alive", is itself divided into two or more other states.The number of division depends on the type of disease.In the process of gastric cancer, the survival time of patient is recorded since the time the patient has undergone surgery.After surgery the patient enters the study and is subjected to death hazard.In these studies, the occurrence of death and relapse are considered as the end point of the study and the intermediate even respectively.This modeling is schematically shown in Figure 2.
There are three transitions for patients during the study in the above model: i) Death hazard without a relapse (state 1 state 3); ii) Relapse hazard (state 1 state 2); iii) Death hazard with a relapse (state 2 state 3).
To define a multi-state model, it is necessary to determine the transition rate among states.This transition rate for the transition from state r to state s follows:

(X(t+ t)=s| X(t)=r, H(t))/ t
In this equation H(t) is an indicative of the history of the process up to time t.Most researchers consider assumptions like Markov or time homogeneity in order to model transition rates.These assumptions can make the multi-state model simpler, but if these assumptions hence incorrect inferences.In the following they will be discussed in some depth.transition rates in multi-state models.Based on Markov assumption, forecasting the future of the process is only dependent on the current state of the process and there is no need to the history of the process up to time .According to this assumption, the transition from state r to state s can

(X(t+ t)=s| X(t)=r/ t
Thus, the transition rate can change over time, but it is not dependent on the past and the history of the process.One of the most important medical data which can be considered as the history of the process is sojourn time of the process in a particular state.Therefore, based on Markov assumption the transition rate from state r to state is not dependent on sojourn time of the process in r state.Now, suppose that the r state is related to the relapse and the s state is related to the absorbing state of death.It cannot be expected that death occurrence in patients who have had relapse in a short time and have then experienced death event be the same as patients who have remained in this state for a longer time and have then experienced the death event.For this reason, applying Markov assumption is not reasonable for medical studies and cancer and fundamental assumption.The special feature of Markov model is that the maximum likelihood function for a sequence of discrete observations is easily obtained and it often has a closed form.But if this assumption is suitable and exhaustive test.It should be mentioned here that the proposed methods for evaluating this assumption are few and minor.Most of presented methods have exact times of transitions among the states.Then, based on these estimated times, a test for Markov assumption has been designed.Assessing Markov assumption in these methods is based on the effect of sojourn time of the process in former states on the transition rate to latter states.In this method, of course, the precision and accuracy of results is based on the precision of transition times among states because observing states occurrence of in a multi-state model occur often in optional times (Kay, 1986;Pérez-Ocón et al., 2000).In these models the exact which in turn can affect the results.Other methods have also been presented to assess Markov assumption which such as progressive model; or have been designed just for right-censored data; or even require regular time intervals between observations (Peto and Peto, 1972;Foulkes and De Gruttola, 2003;Healy and Degruttola, 2007).
Another assumption which is usually considered for multi-state models is the time homogeneity assumption.Based on this assumption transition rate from state r to state s is not dependent on time.
In other words, the transition rate from state r to state s is constant over time.Various methods have been developed to assess this assumption (Faddy, 1976; One is the process which is homogeneous with time and transition rate from state r to state s is not dependent on time.The other process is dependent on time with time varying transition rate.The latter process uses statistical tests such as score test and likelihood ratio to assess time homogeneity assumption.One of the limitations of these methods is transition rate modeling among states modeled transition rates as the additive model and others have used multiplicative models to model transition rates.In both forms score tests can be used to test time homogeneity assumption (De Stavola, 1988;Gentleman et al., 1994).
Among other widely used methods to assess this q 23 q 13 assumption the most common is piecewise constant model (Faddy, 1976;Saint-Pierre et al., 2003;Titman, 2008;Titman and Sharples, 2010a).In this method, the transition rates among states are modeled as piecewise constant.This method is general and exhaustive to assess time homogeneity, however, in this method the number of points and intervals in which the transition rates is of this method is highly dependent on selected points.Some algorithms, of course, have been designed to select these points based on the maximum likelihood function (Chen et al., 1999;Mathieu et al., 2005;Ocañ-Riola, 2005).But as the number of observations per person increases, the application of these methods is limited (Chen et al., 1999).This is why most researchers tend to use the following issues to select these points: i) clinical indications (Sharples et al., 2001); ii) investigating the model of empirical hazard function (Pérez-Ocón et al., 2001); and iii) selecting points so that the number of observations becomes equal (Kay, 1986).
Although Markov and time homogeneity assumptions make the multi-state model simpler, they will lead to in case these assumptions are not held.Models which have been presented to assess the assumptions of a multi-state model, on the other hand, have practically many limitations.These limitations are either related to the presented method for assessing the hypotheses or are Therefore, this study aims to present exhaustive methods to assess these hypotheses based on Cox-Snell residuals, Akaikie information criterion, and Schoenfeld residuals models nor to a special censoring mechanism but of course applicable to all statistical softwares.

Materials and Methods
In this study, 330 patients with gastric cancer with the following data were studied: i) the patients had been hospitalized and had undergone surgery from 1995 to 1999 in surgical wards of Cancer Institute of Iran; ii) they had records in the archives of the hospital, and available for subsequent follow-up.The survival time of patients was determined after surgery and those patients who were still alive at the end of study period or the period were considered right censored.To investigate the disease process and to assess the common hypotheses of multi-state models, a model with three states of patient's being alive without a relapse (state 1), relapse (state 2) and death (state 3) was considered.Moreover, to assess Markov and time homogeneity assumptions in relapse (state 1 state 2), death hazard without a relapse (state 1 state 3) and death hazard with a relapse (state 2 state 3), demographic variables such as age (at the time of surgery), sex, and smoking history; clinical data of the disease including tumor location (Cardia -Anteriorother), type of pathology (Adenocarcinoma -other), disease stage (I-II-III-IV) (American Joint Committee on Cancer, 2002), location of metastases (lymph nodes -liver -other), the type and extent of gastrectomy (T.G-S.G-D.G-PT.G-PX.G); and post-surgical and treatment variables including number of renewed treatments (chemotherapy -radiotherapy -surgery or a combination of them) were used.
In multi-state model, the change in states is independent of the past and the sojourn times are independent in different states, so the only factor which affects Markov process (the process of being Markov) is the type of process sojourn in each state.In other words, the sojourn time distribution has an integral role in model's Markov process (process of being Markov).For example, if the transition rate from state r to state s is to have Markovian features, it is necessary that the time in which the process stops (is in sojourn) in r state have an Exponential distribution.Therefore, one of the methods used to test Markov assumption is the assessment of sojourn time time of the process in a state is usually considered from entrance time to that state to entrance time to the next state.A wide range of statistical distributions can be regarded for this time duration including: Exponential, Weibull, Log-normal, Log-logistic, Gompertz, and Generalized gamma.
But the main issue is the assessment of a suitable distribution for the sojourn time because if this time duration has exponential distribution, it will be suggestive of holding Markov assumption in transition at hand.So Cox-Snell residuals and Akaikie information criterion were used to assess a suitable distribution for the sojourn time in the states of a multi-state model and Markov assumption accordingly.Cox-Snell Residuals is a distributions; the less deviation of residuals from the (Weissfeld and Schneider, 1990;Escobar and Meeker Jr, and Prentice, 2011).Graphical methods are often associated with optical illusion.For a better judgment, thus, Akaikie information criterion can be used along with Cox-Snell residuals.Akaikie information criterion is used and the smaller it is, the better it will be (Akaike, 1998;Collett, 2003;Klein and Moeschberger, 2003;Klein and Zhang, 2005).AIC for the distributions used in this study has been calculated according to the following equation:

AIC=-2 log(likelihood)+2(p+k)
In which p is the number of parameters in the model and k is a constant coefficient which has been used depending on the type of model.For example, k=1 is for the exponential distribution and k=2 for Weibull distribution (Klein and Moeschberger, 2003).The will be.Besides, the following criterion has been used exponential distribution instead of the optimal distribution (Akaike, 1980;1998;Anderson, 2008).
In which AIC min is the value of Akaikie information criterion for the best distribution and AIC exponential is the value of Akaikie for Exponential distribution.Based on this criterion, if the data goodness of fit rate of Exponential distribution is high, Markov assumption can be consistently accepted while the low rate of data held.
As it was mentioned earlier, transition rates among states will not be dependent on time and will remain constant over time if time homogeneity assumption is taken into account in multi-state models.In other words, the transition rate from state r to state s is constant over time.There is a close relationship between transition rates modeling and time homogeneity assumption in multi-state models.Because in this state the transition rate is a function of covariates effects and the constancy of transition rate with the time is equal to the constancy of covariates effects over time.The most common statistical model for modeling the transition rates of a multi-state model is the semi-parametric Cox model.In this model the transition rate is a function of the covariates effects in Cox regression model.Based on semi-parametric Cox model, modeling is as follows: q rs (Z(t))=q (0) rs exp( T rs Z(t)) In this model q (0) rs is the baseline transition rate, T rs is the Z(t) is the vector of covariates in transition from r state to s state.Therefore, the assessment of time homogeneity assumption in each transition is equal to holding proportional hazards assumption in modeling that transition.Ergo, proportional hazard assessment in each transition can be considered as a method to evaluate the time homogeneity assumption.Different methods have been designed to assess this assumption which are mostly graphical and are mainly based on Score and Schoenfeld residuals (Klein and Moeschberger, 2003;Klein and Zhang, 2005).However, applying methods which are based on statistical test and provide making judgments about holding this assumption is always preferred.One of the methods designed to achieve this aim is a statistical test based on Schoenfeld residuals (Grambsch and Therneau, 1994).This method can be performed with any censoring mechanism and is not limited to a particular type of multi-state models (Klein and Zhang, 2005;Jackson, 2011).In this study, thus, we used this method which is run-able in most common statistical softwares to assess time homogeneity assumption of the process in each transition rate.All analyses in this study were carried out with STATA11 and for this test.

Results
assumption in three transition rates of relapse (state 1 state 3), death hazard without a relapse (state 1 state 2) and death hazard with a relapse (state 2 state 3) showed that in case of a relapse (state 1 state 2), Log-logistic among statistical distributions is more appropriate for sojourn time in state 1 (Figure 3).The analysis results of these residuals for death hazard without a relapse (state 1 state 3) also revealed that among statistical distributions, Gompertz is more appropriate for sojourn time in state 1 (Figure 3).Moreover, these analyses indicated that for death hazard with a relapse (state 2 state 3) Log-normal distribution is appropriate for sojourn time in state 2 (Figure 3).these results.Based on AIC, Log-logistic model is the appropriate distribution for the sojourn time of state 1 in transition rate state 1 state 2, Gompertz model is the appropriate distribution for the sojourn time of state 1 in transition rate state 1 state 3, and Log-normal model is the appropriate distribution for the sojourn time of state 2 in transition rate state 2 state 3 (Table 1).Furthermore, the comparison of Akaikie information criterion of exponential distribution with the most appropriate distribution for sojourn time in each transition suggests distribution for sojourn time (Markov assumption) in transitions of relapse (state 1 state 2), death hazard without a relapse (state 1 state 3) and death hazard with a relapse (state 2 state 3) were1.5, 12.2, and 24.7 percent respectively.The analysis of time homogeneity assumption based on Schoenfeld residuals test in all transitions of a multi-state model and for all of the variables which have been used in modeling also showed that for relapse (state 1 state 2) and for death with a relapse (state 2 state 3) this assumption is held.But for death without a relapse (state 1 state 3) time homogeneity assumption is not held.In addition, further analyses on covariates in each transition show that in relapse (state 1 state 2) and death with a relapse (state 2 state 3) transitions, this assumption is held for the general model as well as for each of the variables used in modeling.But for death without a relapse transition (state 1 state 3), this assumption is not held for the general model; neither is it for the variable of 'number of renewed treatments' (Table 2).

Discussion
The multi-state model is an appropriate model to study cancers, like gastric cancer, which have a high rate of mortality.These models provide the possibility of a closer analysis of variables behavior and based on therefore provide researchers with more detailed and accurate data.But reaching this objective requires holding assumptions such as Markov and time homogeneity.These assumptions can make the multi-state model simpler, but model hence incorrect inferences.There are, of course, a number of methods designed to assess the assumptions of a model which have many limitations in practice.These limitations are either related to the method presented to assess the assumptions or are limited to a special type of the data: right censoring, for instance, or a special type of multi-state models: e.g.progressive model (Faddy, 1976;1988;Gentleman et al., 1994;Chen et al., 1999;Pérez-Ocón et al., 2000;Pérez-Ocón et al., 2001;Foulkes and De Gruttola, 2003;Healy and Degruttola, 2007).The present study, thus, attempted to present exhaustive methods for assessing these assumptions based on Cox-Snell residuals, Akaikie information criterion, and Schoenfeld residuals which are not limited to a special type of multi-state models and censoring mechanism and are applicable to most statistical softwares.To assess these assumptions, a multi-state model with three states of patient's being alive without a relapse (state 1), with a relapse (state 2) and death (state3) was considered.The assessment of Markov assumption based on Cox-Snell residuals (Figures 3) and Akaikie information criterion (Table 1) showed that among statistical distributions Log-logistic distribution for sojourn time in state 1, Gompertz distribution for sojourn time in state 1, and Log-normal distribution for sojourn time in state 2 were the most appropriate distributions for transition rates of relapse (state 1 state 2), death hazard without a relapse (state 1 state 3) and death hazard with a relapse (state 2 state 3 may seem reasonable that the Exponential distribution is not appropriate for sojourn time and therefore Markov of taking Exponential distribution into account for the sojourn time (Markov assumption) in transition rates of relapse (state 1 state 2), death hazard without a relapse (state 1 state 3) and death hazard with a relapse (state 2 state 3) are 1.5, 12.2, and 24.7 percent respectively.relapse (state 1 state 2), when considering exponential distribution for state 1, is 1.5 percent only, it can be concluded that Markov assumption is not only held for this transition.As the relative rates of goodness of time (Markov assumption) for death hazard without a relapse (state 1 state 3) and death hazard with a relapse (state 2 state 3) transitions were 12.2 and 24.7 percent respectively, Markov assumption can be accepted in these is, of course, a dearth of studies conducted in this scope and a criterion cannot be simply presented for goodness of considered the criterion of 5%.In other words, if sojourn time of exponential distribution is taken into account assumption can then be accepted for the focused transition.
The analysis of time homogeneity assumption based on the results of Schoenfeld residuals test (Table 2) also revealed that this assumption-with regard to the In the absence of the basic assumptions such as Markov and time homogeneity in multi-state models, many alternatives have been proposed.In cases where Markov assumption is not held, using non-Markov and semi-Markov models (Chen and Tien, 2004;Ruiz-Castro and Pérez-Ocón, 2004;Kang and Lagakos, 2007;Meira-Machado et al., 2009;Foucher et al., 2010;Titman and Sharples, 2010b;Titman, 2012), and in cases where time homogeneity assumption is not held, constructing models of transition rates as time-dependent, based on parametric models; or constructing models based on piecewise constant models have been proposed as alternatives (Omar et al., 1995;Pérez-Ocón et al., 2001;Hsieh et al., 2002;Mathieu et al., 2005;Ocañ-Riola, 2005;Meira-Machado et al., 2009;Titman, 2011).But, generally, using each of these generalizations requires assessing Markov and time homogeneity assumptions because if these assumptions are held, it will be inappropriate to use more complicated models.On the other hand, applying a multi-state model using Markov and time homogeneity assumptions where the model to the data hence incorrect inferences.

Figure
Figure 1.Standard Models of SurvivalAliveDeath

Figure 3 .
Figure 3.The Analysis of Cox-snell Residuals for Markov Assumption in Modeling Transition Rates

Table 2 . The Analysis of Time Homogeneity Assumption Based on Schoenfeld Residuals Test in Transition Rates
test and for each of the variables of model-is held only for relapse (state 1 state 2) and death hazard with a relapse (state 2 state 3).The only variables which prevents this assumption being held in death hazard without a relapse (state 1 state 3) transition, is the 'number of renewed treatments'. general