Estimation of Esophageal Cancer Incidence in Tehran by Log-linear Method using Population-based Cancer Registry Data

Awareness on cancer incidence is essential like to plan, control, and promote regional and national cancer control programs (Kamo et al., 2007). An estimation of the incidence of all cases compared to registered cases or in other words, completeness level of cancer registration is one of the main parts of quality control in such registration. Incomplete record system faces the incidence and survival values calculation with distortion (Aghaei et al., 2012). This is important enough that frequently an important part of quality control in cancer registration reported by the International Association for Research on Cancer (IARC) is based on the estimated cases number in communitynot based on the registered cases (Parkin, 1994; Schouten et al., 1994; Ghojazadeh et al., 2013). Death certificate only and Mortality/Incidence Ratio indicators are among the indicators used normally to estimate the hidden population in registry systems (Shanmugaratnam, 1989), but they are indirect indicators that are strongly associated with the fatality rate of disease. When two or more incomplete lists of the number of cancer cases are available, capture-recapture (CR) method will be used to estimate the number of available cases in society, which is considered as a direct indicator for estimating cancer


Introduction
Awareness on cancer incidence is essential like to plan, control, and promote regional and national cancer control programs (Kamo et al., 2007). An estimation of the incidence of all cases compared to registered cases or in other words, completeness level of cancer registration is one of the main parts of quality control in such registration. Incomplete record system faces the incidence and survival values calculation with distortion (Aghaei et al., 2012). This is important enough that frequently an important part of quality control in cancer registration reported by the International Association for Research on Cancer (IARC) is based on the estimated cases number in community-not based on the registered cases (Parkin, 1994;Schouten et al., 1994;Ghojazadeh et al., 2013). Death certificate only and Mortality/Incidence Ratio indicators are among the indicators used normally to estimate the hidden population in registry systems (Shanmugaratnam, 1989), but they are indirect indicators that are strongly associated with the fatality rate of disease. When two or more incomplete lists of the number of cancer cases are available, capture-recapture (CR) method will be used to estimate the number of available cases in society, which is considered as a direct indicator for estimating cancer 1 Alireza Mosavi-Jarrahi 1 , Toraj Ahmadi-Jouibari 2 , Farid Najafi 3 , Yadollah Mehrabi 1 , Abbas Aghaei 2 * incidence (Robles et al., 1988;Hilsenbeck et al., 1992;Laporte et al., 1992;Tull and McCarty, 1992;McCarty et al., 1993;Suwanrungruang et al., 2011). Among the techniques described in IARC reports, capture-recapture method has been considered at the same level of the best methods. During the last few decades, capture-recapture methods have been increasingly used in epidemiology and for the promotion of health care methods. Furthermore, they are used in various health fields to estimate hidden populations, incidence, and prevalence of diseases and special events. Such information is very important for policymakers and health workers (Hook and Regal, 1995;Aghaei et al, 2013). The use of capture-recapture methods is very efficient for reducing the costs of disease registration as well as reducing bias in incidence estimations and in the case of comparing population subgroups.
On the other hand, modeling the effect of intervening variables presents better estimations of the real size of population and therefore solves many problems of the estimation of population size (Tilling, 2001). Capturerecapture method has assumptions such as independence of sources, the equal chance of cases for being found in each source and independence of cases. Frequently, such assumptions are not expected for available sources, so log-linear models are proposed to be used in such situations (Laporte, 1994;Héraud-Bousquet et al., 2012).
The activity of Tehran's population-based cancer registry center or Tehran Metropolitan Area Cancer Registry (TMACR) has began since 1998. The main goal of this center is to measure cancer incidence in residents of Tehran metropolis. Data collection area of TMACR includes Tehran metropolis plus Islamshahr and Shemiranat townships (Etemadi et al., 2008;Aghaei et al., 2013).
Central areas of Iran including Tehran have the highest incidence of esophageal cancer in Iran after North and Northeast regions (Ramdar, 2010;Sadjadi et al., 2010). According to the first Tehran's population-based cancer registry report during 1998-2001, men and women living in areas covered by this center have allocated the sixth and the fifth ranks of malignant esophageal cancers to themselves, respectively.
This study is among the first researches done by loglinear method in the country and aims to estimate the incidence of esophageal cancer in TMACR.

Materials and Methods
New cases of esophageal cancer reported by three sources of death certificates, pathology reports, and medical records to TMACR during 2002-2006 entered the study. Since Tehran is considered as a referral center, cases not being among the areas covered by the registration center according to their address and contact number were excluded from the list. Duplicates were identified and removed using EXCEL software. Features such as name, surname, and father's name were used to identify common cases among sources and, if necessary, date of birth and ID number options were also used.
In this study, all three sources entered the Loglinear models. Regarding the possible relationships between resources in three-source method, there will be eight models. A model in which all three sources are independent; a model in which a pair of sources are interdependent and independent of the third source, of which there will be three modes; a model in which two pairs of sources are interdependent and independent of the third pair, of which there will be three modes too; and the eighth model is the case in which all binary relations will be. To select the best-fit model, we used Akaike statistics, the use of which is very common in scientific researches and analyses. (Hook and Regal, 1995;Hook and Regal, 1997).
The formula for this statistics in log-linear method is as follows: AIC=-2G 2 +2(df) Where G 2 is the statistical ratio of likelihood function related to models fitness and df is the degrees of model freedom. To select the model, we can act as the following. The model with the highest value of Akaike statistics and the highest p value for G 2 statistic is chosen as the best model. SPSS software version 16 was used to select the model and estimate the incidence.
Incidence is obtained based on the new cases of an event (esophageal cancer) occurred at a certain time over the number of the population at risk for same event at that time. In this study, all the incidences are reported based on the incidence per one hundred thousand populations.
Then the incidence estimation was conducted based on the selected model for the subgroup of study calendar years, gender subgroup, and age subgroups. Denominators of the incidence for calendar years considered the population of that year and for gender and age subgroups are considered the population of five years of study.

Results
After investigating common cases between sources and removing duplicates, new cases of esophageal cancer reported to TMACR during 2002-2006 were equal to 1458. The average age of all cases was 70.43 years, 70.54 (±11.47) years for men and 70.33 (±12.4) years for women. The men to women ratio were 1.36:1.
A model in which two sources of pathology reports and medical records and also death certificates and medical records were mutually interdependent and two sources of pathology reports and death certificates were independent was selected as the best model with the highest value of

Discussion
There are two important and relevant assumptions when using capture-recapture methods in epidemiological studies. First, if two sources are used, they should be independent of each other and generally, in the case of using a model with K source, there should be no dependence between all K sources. Second, all people should have equal chance entering the resources. Underestimation or over-estimation proportional to the actual population size will be possible in the absence of each of these assumptions. These assumptions are very low likely to be fully established in an epidemiological study. For example, death certificates and hospital medical records documentation are two sources commonly used in a cancer registry system. Cases registered in each of the sources are very likely to be recorded in the other source. This shows dependency between the sources and flaw-making in the first assumption. On the other hand, the possibility of hospitalization of patients with severe disease in the hospital and their registration as cancer cases is much greater and also death because of disease will be more in these people, which is in conflict with the second assumption. These problems decrease accuracy and reliability of the estimation provided by capturerecapture method. Approach to encounter these problems is the use of more than two sources, the use of log-linear models for modeling dependencies between resources (Hook and Regal, 1995), and the use of layering methods in terms of disease behaviors and people for equalization of cases registration chance in each source (McClish and Penberthy, 2004).
The average age of esophageal cancer incidence was 70.43 (70.54 for men and 70.33 for women). The age of incidence does not show much difference between men and women, but is slightly higher than the average age reported by other studies conducted in other parts of the world (Bashash et al., 2008). The ratio of incidence in men to women is larger than one consistent with international and regional reports (Boyle and Levin, 2008;Babaei et al., 2009;Mohagheghi et al., 2009;Sadjadi et al., 2010;Amani et al., 2013;Wang et al., 2013).
The result of log-linear models fitness was as follows: on the one hand, the sources of medical records and pathology reports and on the other hand, medical records and death certificates are mutually interdependent and two sources of pathology reports and death certificates are considered independent. Although in this study, we have tried to select the best model using AIC index and p value for G 2 statistic, depiction of these relationships in terms of what happens in society seems also logical because it is very difficult to consider pathologies and hospitals independent. In addition, in death certificates list, the possibility of finding a case with acute conditions having been hospitalized is much more than finding cases with better conditions. According to WHO (World Health Organization) report in 2008, Blacks of America, the countries of South Africa and East and Central Asia have had the highest incidence. In addition, according to GLOBOCAN 2008 estimations providing an estimation of incidence and mortality related to cancer, the highest estimated incidence has been recorded to be for the countries of South Africa, East Asia and East Europe and the lowest incidence has been for West, Center and North Africa countries as well as countries located in Central America (Boyle and Levin, 2008). In this study, the registered incidence was 4.5 per one hundred thousand populations being less than the value estimated by WHO for Iran (6.8 per one hundred thousand populations) and also less than the values expected based on studies conducted in the country. Sajjadi et al. (2010) reported that northern regions of Iran have a high rate of esophageal cancer incidence, so that Golestan province (Iran) located in the Northern East and Ardabil province in the North West have the incidence of about 40 and 15 per one hundred thousand populations, respectively. Moving towards the southern parts of the country, esophageal cancer incidence is reduced, so that Semnan province in the center and Kerman province in the south of the country have the incidence of about 11 and 3 per one hundred thousand populations, respectively. Since Tehran metropolis is located in the center of the country and has placed a combination of all the country populations in itself because of its geographical, political and economic position, it is expected to have a medium incidence rate compared to other parts of the country and be the same as the central provinces of the country. While the incidence reported in this study and earlier TMACR reports is less than the expected values in the country's center, the estimated incidence by log-linear method is similar to the incidence of cancer in the country's central provinces (Mohagheghi et al., 2009;Sadjadi et al., 2010).
Incidence based on the studied calendar years are almost similar to each other, and have an increasing but irregular trend. The values estimated by log-linear method are higher than the values reported to the registration system indicating the low completeness of cancer registration in TMACR. The highest and lowest incidence was related to calendar years 2004 and 2002, respectively. Since death certificate is one of the three sources used in this study and 5-year survival rate of esophageal cancer patients is about 10% based on the studies (Bashash et al., 2008;Boyle and Levin, 2008), it is possible -in the first years of this study -that some cases which have been identified by death certificates to have been registered by other sources in the years before this study. On the other hand, the cases reported in the last years of this study by two sources of pathology reports and medical records are less likely to be observed in death certificates. Therefore, it is expected that the middle years of this study show more complete and consequently a more precise estimation.
The estimation of esophageal cancer incidence shows a salient difference between men and women. Since the estimated incidence in women is more similar to the reported incidence, it is indicated that the registration completeness for women is larger than that for men. On the other hand, according to GLOBOCAN 2008 estimations the incidence of 6.3 and 7.4 per one hundred thousand populations for Iranian women and Iranian men reported, respectively. The estimation of this study for women is much similar to that estimation.
The reported incidence in the age subgroups is almost similar to the previous TMPCR reports ( Mohagheghi et al., 2009). However the estimated incidence for each age group in present report is more than twice the reported values. In populations with very high incidence, the incidence report in older people is about 3-4 times the value estimated in this study (He et al., 2008;Matsuda et al., 2008). As expected, the estimated incidence for age subgroups is less than the reported values for northern areas of country that having high esophageal cancer incidence (Semnani et al., 2006;Babaei et al., 2009).
Based on what we have mentioned before about the precision of estimated incidence for middle year, we believe that the estimation obtained for calendar year of 2004 can be considered as a good estimation for esophageal cancer incidence in the area under the coverage of TMPCR.