Coverage , Density and Completeness of Sources used in Tehran Metropolitan Area Cancer Registry : According to the Data of Esophageal Cancer , 2003-2007

Recent development in health related technology has the number of data registries for different health condition such as cancers, cardiovascular diseases and other non-communicable and communicable diseases. Most of such registries are based on available data in health system. Such data are mostly those which are not initially collected and managed for the especial goals of an ongoing research (Hearst and Hulley, 1988). They are often collected for a systematic management of an health condition or evaluating all health activities related to that condition which might need to collect different set of data (Sorensen, 1982; Sorensen et al., 1983). In fact using the available data result in saving time and cost. In addition the large sample size used for collection of such data increase the generalizability of result and reduces the probability of information bias by not asking directly from the patient. However, using such data when the process of collecting data is not under direct supervision of an researcher may decrease the validity and accuracy. Among the limitations, it can be mentioned that controlling the collection method and quality is not under the supervision of the investigator and evaluating its validity is sometimes impossible (Hearst


Introduction
Recent development in health related technology has the number of data registries for different health condition such as cancers, cardiovascular diseases and other non-communicable and communicable diseases.Most of such registries are based on available data in health system.Such data are mostly those which are not initially collected and managed for the especial goals of an ongoing research (Hearst and Hulley, 1988).They are often collected for a systematic management of an health condition or evaluating all health activities related to that condition which might need to collect different set of data (Sorensen, 1982;Sorensen et al., 1983).In fact using the available data result in saving time and cost.In addition the large sample size used for collection of such data increase the generalizability of result and reduces the probability of information bias by not asking directly from the patient.However, using such data when the process of collecting data is not under direct supervision of an researcher may decrease the validity and accuracy.Among the limitations, it can be mentioned that controlling the collection method and quality is not under the supervision of the investigator and evaluating its validity is sometimes impossible (Hearst  , 1988;Roose et al., 1990).Completeness and accuracy of available data are the most considered when using these data (Goldberg et al., 1980;Stone, 1986).Data quality and validity of registry systems which use 2 or more data sources is directly related to quality and validity of the used sources.
Population-based cancer-registries are valuable sources for public healthcare and researches because of providing important information regarding cancer.The extend of reliance on these data is related to quality and validity of data (Elbasmi et al., 2010); however, they rarely report such information (Bullard et al., 2000) since more accessibility and valuation about the collected data is needed.In addition further investigation about the validity and completeness of such data need to be followed by looking other sources.In case of cancer registry some activities such as reabstracting, measuring the values of death certificate notification (DCN) and death certificate only (DCO), comparing the registered data with available data in other data banks will be needed (Chen et al., 1994).
In Iran the first activities in order to organize cancer reporting was started in 1956 when Cancer Society in Tehran University was founded (Habibi 1985).In 1970, observing a high incidence of esophageal cancer in sidelines of Caspian Sea persuaded the investigators to sustain the first population-based cancer-registry center (Kmet, 1972).By now, population-based cancer-registry is only implemented in Golestan, Ardabil, Tehran, Semnan and Kerman provinces in which none has provided a report of registry quality control (Mohagheghi and Mosavi-Jarrahi, 2010;Zendedel et al., 2010).
This study aimed to evaluate the sources used in Tehran Metropolitan Area Cancer Registry (TAMCR) regarding the coverage, data distribution density and completeness in addition to provide a new method for quality control of data registration in population-based cancer registries.

Materials and Methods
TMACR uses three main sources of pathology reports, medical records of hospitals and death certificates for data registration (mohagheghi et al., 2009).In the present study, we evaluated all of the esophageal cancer data which have been actively or passively reported by the three mentioned sources during 2003-2007.In order to track deaths occurred in 2008 for incident cases of previous years, the death data in this year were also evaluated.All ata evaluated in regard to coverage, density and completeness which was introduced by Felix Naumann and Johann Christoph Freytag in (Naumann et al., 2000(Naumann et al., : 2004)).

Coverage
Coverage of a source (e.g.pathology reports) is defined as the proportion of a community population (here all patients with esophageal cancer) who have been identified by the source, C(s)=S/P.
In which S is the number of the reported cases by the considered source, P is the community population (all of the patients with esophageal cancer reported to cancer registry) and C(S) is the coverage of the considered source.If it is assumed that all of the available cases are identified and each is at least registered by one source, then p is resulted from the total registered cases (regarding the common cases) by the sources, p=s 1 Us 2 U...Us n .

Density
Density is the proportion of complete (non-empty) fields of the provided data by a source.Generally, there might be many empty fields among the provided data of a source; for example, among the main fields needed for registering a case of esophageal cancer, address or date of birth might be missing.For density, two parts are respectively calculated: density of distribution and source.a) Distribution density: this part includes distribution for each of the considered fields in a source: d s (t a )={teS|t a ≠^}/S.While t includes the required fields for registration (e.g.date of diagnosis of cancer), t a is the number of complete cases in a field and ^ is the unregistered cases in the source S. In fact for each cancer registry, the importance of each field is different.For instant, in cancer registry, date of diagnosis is one of the most important fields for determining incidence and prevalence in different time intervals.In TMACR, the fields of name, surname, father's name, date of diagnosis, sex, address and birth date are classified as the main fields.
b) Source density: density of a source is the mean of all distribution densities for each source, d(s)= 1 / A S t a eA d s (t a ).In which A is the number of the considered fields in the source S.

Completeness
Using the criteria of coverage and density, another criterion as completeness can be defined foe each sources in a registry system, C(S)=c(S).d(s).

Results
During 2003-2007, 1,404 new cases of esophageal cancer were reported to TMACR.The coverage of death certificates, pathology reports and medical records of hospitals was 0.434, 0.549 and 0.308, respectively.
In Table 1, the distribution density of the fields has been presented for each of the sources, separately; the density of death certificates, pathology reports and medical records of hospitals was 0.976, 0.824 and 0.956, respectively.
According to the values for the coverage and density of the sources, the completeness of death certificates, pathology reports and medical records of hospitals was 0.424, 0.452 and 0.294, respectively.

Discussion
Regarding the sources coverage, death certificates, pathology reports and medical records of hospitals had ordered from first to third rank, respectively.For the purpose of this study, we did not investigate the sensitivity and specificity of each sources and therefore the precision and accuracy of each of the sources need to be discussed in details in future studies.
In the distribution density, while the fields of diagnosis date, sex, name and surname had a density of 1 or approximately 1, the field of address in each of the three sources had the lowest density compared with other fields.It is worth mentioning that; because the diagnostic and therapeutic centers under the coverage of TMACR are referral, the low density of the address field among the reports of the used sources makes it difficult to claim that the reported cases to the center belongs to the covered population.
According to the goals of each of the registry systems, a specific weight can be considered for different fields to have a better comparison between the sources.For example, the registry systems aiming to compare the values between males and females, different jobs, various races, birth subgroups and etc. consider a different importance scale for each of these fields (Elbasmi et al., 2010;Zini et al., 2012).
The source of death certificates had the highest density among others, while medical records of hospitals and pathology reports had respectively the next grades.Although, regarding the coverage of the reports, medical records of hospital is in a lower level compared with the other two sources, it had an acceptable source density.
Death certificates and medical records of hospitals had the highest and lowest completeness.Generally, since patients with esophageal cancer, as a malignant disease, refer to diagnostic and therapeutic centers while having a poor survival (Mathers et al, 2001;Samadi et al, 2007;Boyle and Levin, 2008) the probability of being reported by each of the three sources is expected to be near each other and higher than the current report.
For the first time, this method was used for the sources used in cancer-registry systems and in order to make it possible to compare death certificates with the two other sources, the data of esophageal cancer which is a fatal cancer with a poor survival was applied.Besides, to compare the density between the fields and the sources, the empty or complete fields were considered not the accuracy and precision of the fields.In order to check the accuracy of filling the fields we need to apply more precise and efficacious methods (Scannapieco and Patini, 2004).Having a standard source or conducting a purposeful study can help to evaluate the accuracy of information in each data.
In fact controlling the quality of the registered data should be implemented for each registry system.While because of needing additional cost and time such process is not performed in many registries, using the presented method in this study we will be able to provide a general information about coverage, density and completeness when evaluating quality of data sources used in cancerregistry.