Assessment of a Questionnaire for Breast Cancer Case-Control Studies

Breast cancer is the second most common cancer in women with nearly 1.4 million new cases in 2008 (WHRF). The main risk factors of breast cancer such as inheritance (mutations of BRCA1 and BRCA2 genes) (Ford et al., 1998; Begg et al., 2008), menstrual (early age at menarche, late menopause) and reproductive (null parity, late age at first full-term pregnancy) histories, late age at lactation and short duration are well established but are generally difficult to modify (Ma et al., 2006; Bao et al., 2011). A substantial amount of research has explored the influence of lifestyle factors such as diet (Pala et al., 2009; Hu et al., 2012), smoking (Luo et al., 2011a; 2011b), alcohol use (Zhang et al., 2007; Beasley et al., 2010) and physical activity (Peters et al., 2009; Eliassen et al., 2010), i.e. modifiable risk factors on breast cancer risk. However, convincing evidence has only been shown for high alcohol consumption (WHRF, 2007). It has been recognized that one of the main causes of uncertainty regarding the role of lifestyle factors,


Introduction
Breast cancer is the second most common cancer in women with nearly 1.4 million new cases in 2008 (WHRF). The main risk factors of breast cancer such as inheritance (mutations of BRCA1 and BRCA2 genes) (Ford et al., 1998;Begg et al., 2008), menstrual (early age at menarche, late menopause) and reproductive (null parity, late age at first full-term pregnancy) histories, late age at lactation and short duration are well established but are generally difficult to modify (Ma et al., 2006;Bao et al., 2011).
It has been recognized that one of the main causes of uncertainty regarding the role of lifestyle factors, especially diet, in cancer causation is the lack of accuracy of the method used to obtain information on possible risk factors of cancer (Pasanisi et al., 2002). However, the relationships between diet, smoking, alcohol consumption, physical activity and cancer as an outcome are usually assessed by the questionnaires (Zhang et al., 2007;Pala et al., 2009;Peters et al., 2009;Beasley et al., 2010;Eliassen et al., 2010;Luo et al., 2011a;2011b).
A questionnaire to be used assessing the relationships between lifestyle risk factors and breast cancer was validated in the feasibility part of breast cancer casecontrol study. The article presents the results of testing the criterion validity and external reliability of the questionnaire used in the study.

Study population
The hospital based case-control study included 40 patients (cases) with new histologically confirmed breast cancer diagnose according to ICD 10, who required surgical intervention at the Department of Surgery, Lithuanian University of Health Sciences and 40 controls without cancer diagnose from the other departments of the University's hospital that agreed to participate in the survey. Response rate (RR) of the cases was 71.4%, i.e. 42 out of 56 women agreed to fulfill a questionnaire, but 2 of them were omitted because of the diagnose changed; RR of the controls was 64.5% (40 out of 62 women agreed).
Each subject was asked to complete the questionnaire twice, on a day of admission (Q1) and on a day before discharge (Q2) from the hospital, 4-6 days apart. None of agreed patients knew about the second interview in advance. Both questionnaires were completed by the patients. The study was approved by the Kaunas Regional Biomedical Research Ethics Committee (01-10-2007No. BE-2-1, Report No.5/2007. Written consent to participate in the study was received from each patient.

Questionnaire
The questionnaire on breast cancer risk factors was based on modified and adapted English version of Aichi Cancer Center Research Institute Lifestyle and Health Questionnaires (Hirose et al., 1995) with exception the parts of physical activity measured by Baecke (1982), consumption of alcoholic beverages (Horn-Ross et al., 2004) and a part of work environment with original questions. Translation of the questionnaire from English to Lithuanian and retranslation back to English and again translation of English version to Lithuanian by different person was carried out to ensure the uniformity across the questionnaires used. The questionnaire was pretested before validation study.
The responses to the questions on woman's history were "yes" , "no" or a figure showing age of the beginning of menstruation, first pregnancy, delivery, beginning of menopause, number of pregnancies and deliveries, miscarriages and abortions, breast-fed children, duration of the use of estrogens/estrogens and progestin during menopause. The question "Was your menses regular at 18 years of age?" had following responses: "Almost regular (±3 days)/sometimes irregular (one cycle in six month is untypical, i.e. variation is more than three days or cycle appears not every month/irregular (variation of cycle more than three days or cycle appears not every month)/never have had". The question "Do you still have menses?" had three possible responses: "Yes/yes, but the menses are irregular during the last 12 months/no, did not have during the last 12 months and more". The third response "No, did not have during ..." raised question "How did menopause occur?" with following responses: "Naturally at age ... years/after surgical treatment of ovaries or uterus at age ... years/other reasons at age ... years". The questions "Have you ever used following contraceptives? and "Have you ever used hormones for treatment of infertility?" had responses "No, never/yes, almost a month/2-6 months/>6 months".
The question "Do you smoke?" had following responses: "Yes, I do .... years/yes, I do sometimes/gave up ... years or ... months ago/no, I have never smoked". The other responses were the figures related to the number of cigarettes smoked a day or age, when they started to smoke or length of time spent in smoke-filled apartment. The question "Does/did someone smoke at your workplace?" was assessed on a rating scale "Never/sometimes/half of a day/almost all working day/never worked". The response on length of time spent at smoke-filled workplace was estimated on five-point scale "<1/1-4/5-9/10-19/≥20 years".
Alcohol consumption was estimated by frequency and quantity of the use of strong alcoholic beverages (liqueur, brandy, vodka and etc.), wine/champagne and beer at the age under 25 years, 26-35 years of age and a year before the survey (Horn-Ross et al., 2004). Frequency of drinks was evaluated on a rating scale "Every day/4-6 times a week/2-3 times a week, 1 time a week/1-3 times a month/once in 2 months/do not use". Quantity of different alcoholic beverages was assessed by ml per drink.
All responses on physical activity were estimated on five-point scale "Never/seldom/sometimes/often/always" with the exception of the questions on the name of the main occupation, the types of sport played and duration of sleep in average (Baecke et al., 1982).
FFQ with a rating scale "Almost do not eat or less than once a month/1-3 times a month/1-2 times a week/3-4 times a week/5-6 times a week/every day" was used to assess the use of 35 food items from the main Lithuanian diet (Kriaucioniene et al., 2008).
The possible responses to the questions on dust, chemicals, radiation, stress, and other factors at workplace were "yes" and "no" with the exception of the questions on duration of exposure to the factors.

Statistical analysis
Since variables of possible risk factors were not distributed normally, criterion validity of the questionnaire Q2 relative to the reference questionnaire Q1 was estimated by Spearman's correlation coefficient (SCC) (Pasanisi et al., 2002). Intraclass correlation coefficient (ICC) was calculated to measure external reliability (test-retest) of the questionnaire. We used the cut offs to interpret the level of relationship associated with a given statistic: r≤0.3=weak, 0.3<r≤0.7=moderate, r>0.7= substantial (Agresti, 1996). The data analysis was performed with the SPSS 16 software program. Table 1 lists some characteristics of a study group. Breast cancer patients (cases) were older than controls, i.e., patients without cancer diagnose. Significantly more subjects with secondary/special secondary education were defined in a group of the controls. A difference according to residence between cases and controls was not defined.

Results
SCC of the responses on general (date of birth, living place, residence) and socio-economic factors (education, income, marital status) were high and varied from 0.95-1.00 (p<0.01) in the cases and from 0.91-1.00 (p<0.01) in the controls. The lowest ICC was for income per family member per month in the cases (0.97, p<0.01) and education years in the controls (0.94, p<0.01).
SCC for different diseases in the past was 0.64-1.00 (p<0.01) in case group and 0.61-1.00 (p<0.01) in control group; ICC for the diseases of the cases and controls were 0.76-1.00 (p<0.01) and 0.75-1.00 (p<0.01), respectively.
Both SCC and ICC for height and weight of cases and controls at present, at the 20 years and 50 years of age were 0.9 or higher (p<0.01).
SCC and ICC of the responses to the questions on family history on cancer are given in Table 2. The responses to family history on cancer questions substantially correlated in both the cases and the controls.
Correlation of the responses regarding woman's health was substantial with SCC and ICC about 0.7 or higher for both the cases and the controls (p<0.01), except the response of the cases to a question on duration of the use of estrogens and estrogens-progestin during menopause with both statistically insignificant SCC and ICC (Table  3).
SCC and ICC of the responses to the questions on smoking (active and passive) were greater than 0.7 in both the cases and the controls (p<0.01) ( Table 4). Significantly substantial and moderate correlations defined for the responses on frequency and amount of the use of different alcoholic beverages, with exception ICC for the responses of the cases on amount of beer drank at age up to 25 and 26-35 years (Table 4).
Both SCC and ICC for different food items were moderate and substantial (p<0.01) in the cases as well as in the controls, and only correlation of the response on frequency of consumption of veal of the cases was not significant (SCC=0.28, ICC=0.34, p>0.05) ( Table 5).
The responses regarding physical activity of the women correlated substantially with SCC and ICC 0.7 and higher for both the cases and the controls (p<0.01).  SCC of the responses to the questions on dust, chemicals, radiation and stress at workplace as well as on exposure time to these factors varied from 0.722-1.00 (p<0.01) in a case group and from 0.57-1.00 (p<0.01) in a control group; ICC of the responses were 0.84-1.00 (p<0.01) and 0.73-1.00 (p<0.01) in the cases and the controls, respectively.

Discussion
Using a questionnaire in epidemiological surveys is one of the most frequent research methods. However, only valid and reliable questionnaire with optimal number of questions clearly formulated and understood is appropriate as a tool to get necessary information. In research studies   the lack of validity and reliability of the questionnaire shows the systematic and random measurement errors, respectively. Therefore, testing the validity and reliability of a questionnaire as a tool is a way to avoid or reduce the number of errors in scientific studies (Feunekes et al., 1999).
The paper presented the results of a study, which was carried out in order to test the criterion validity and external reliability of the questionnaire used in breast cancer case-control study. Inasmuch as, the "gold standard", that is the SF-36 questionnaire in quality of life studies, does not exist, testing criterion validity of the questionnaire, first completed questionnaire Q1 was used as the reference (Pasanisi et al., 2002). The data showed that the responses to most of the questions on demographic and socio-economic factors, anthropometric indices, family history on cancer, smoking, and physical activity in both the cases and the controls correlated substantially. Moderate and substantial correlation was defined for the responses on diseases in the past. This can be explained by relevant and memorable information asked of the subjects.
Substantial correlation was defined between the responses to most of the questions about woman's health in both the cases and the controls. However, both correlations (SCC, ICC) of the response of the cases to a question on duration of the use of estrogens and estrogensprogestin during menopause were not statistically significant, although the correlations were significant in the controls. A review of the questionnaires revealed that two cases-respondents that used both estrogens and estrogen-progestin medicines mixed their use time. This could be explained by the older age of the cases and personal ability to memorize information dealing with use of medicines. As the evaluation of the use of these hormones is essential, in order to avoid systematic and random errors of the measurement in a further analysis, both the use of estrogens and estrogens-progestin will be assessed as one category of female sex hormones taking into account the overall consumption of these medicines during menopause.
In epidemiological surveys alcohol consumption is determined by five main methods: quantity/frequency and extended quantity/frequency questionnaires, retrospective and prospective diaries, and 24-hour recalls. The mean level of alcohol intake differed by 20% between these methods, although specific questions on intake of beer, wine, and liquor resulted in 20% higher estimates of intake (Feunekes et al., 1999). It has been found that high drinkers tend to underestimate alcohol consumption and drinking behavior, whereas lower drinkers tended to overestimate it (Townshend and Duka, 2002). When there is sufficient evidence that alcohol intake is underestimated in a population, methods that enquire about both the frequency and amount consumed, for beer, wine, and liquor, separately, will yield the most realistic levels of intake (Feunekes et al., 1999).
In our case, alcohol use was measured by quantity/ frequency questions for different sorts of alcohol (strong alcoholic beverages: liquor, brandy, vodka; wine/ champagne; beer). Moderate and substantial correlation coefficients (SCC, ICC) were found for the responses on frequency/quantity of strong alcoholic beverages and wine/champagne. However, ICC of the responses of the cases on amount of beer drunk up to 25 and 26-35 years of age was not statistically significant, although the responses of the controls showed significant correlations. On the one hand, this can be explained by the fact, that in the past, young women at the ages up to 25 or 26-35 years consumed beer rarely. On the other hand, the cases were older than the controls. Because of that, the cases had some difficulties to remember and state correctly consumption of beer in the past.
The same issue was found using FFQ in diet assessment, i.e. the less you eat a certain food item, the more difficult is to remember the frequency of consumption. In epidemiological studies information on diet is collected by FFQ with responses on a scale indicating the frequency of use of the product, sometimes the portion size, and a 1-7 day diet diary. Biomarkers of nutrient intake (mean daily intakes of dietary energy, total fat, saturated, monounsaturated and polyunsaturated fatty acids, linoleic acid, total carbohydrate, sugars, starch, dietary fibre, protein, vitamin C, vitamin E, folate, carotenes, Fe, Ca, Mg, K and alcohol) are also measured. There is evidence of satisfactory links between a 1-7 day diet diary and food-frequency questionnaire as well as different biomarkers of nutrient status (Brunner et al., 2001;Millen et al., 2006;Barclay et al., 2008). However, the data are not consistent (Michels et al., 2005). Some data show that evaluation of diet using the questionnaires prone to measurement error, therefore, for both methods, adjustment of nutrient intake to mean dietary energy intake appears to be the optimal approach to present and analyze the data (Brunner et al., 2001;Michels et al., 2004). In our study, FFQ used represent the consumption of different food items a year before the illness. Therefore, to use a diet diary for present use of different food items or to measure some biomarkers is not correct methodologically, because the data show the information obtained at different times. Our data defined substantial and moderate correlation between the responses to most of the questions about many food items in both the cases and the controls, with exception veal consumption with both statistically not significant SCC and ICC for the responses of the cases and the controls for the use of this type of the meat. As the subjects of the study ate veal rarely, we think that this was the main reason of the different responses about the frequency of veal consumption.
One of the possible shortcomings in the study is that the information collected with the questionnaire was not verified by objective measurements. Biomarkers of smoking (cotinine in plasma, saliva, or urine) (Jarvis et al., 1984), that could supplement and clarify the questionnaire's information were not determined. We did not measure doubly labeled water energy and/or maximal oxygen uptake to validate physical activity questionnaire, however it has been done by Bonnefoy et al. (2001). Inasmuch as, information on diet was collected asking about the food products used a year before the illness, to check different metabolic substances of the diet directly measuring them in biological media of the subjects was not possible. Indirect measuring of the biomarkers of nutritional intake using nutrient database was not possible because of a portion size was not determined by the questionnaire. However, substantial correlations (SCC, ICC) between the most of the responses to most of the questions indicate that information collected by the questionnaire is valid and reliable.
In conclusion, the questionnaire used in a hospital based case-control study on breast cancer is valid to assess risk factors of the disease. The responses to most of the questions correlated substantially or moderately with coefficients of the correlations being statistically significant. Information on the response about duration of the use of estrogens or estrogen-progestin products during menopause will be combined in further analysis and assessed as the female sex hormone use during menopause. The responses to the questions on amount of beer or consumption of veal will not be taken into analysis or treated with caution.