Correlates of Digit Bias in Self-reporting of Cigarette per Day ( CPD ) Frequency : Results from Global Adult Tobacco Survey ( GATS ) , India and its Implications

BACKGROUND
Cigarette per day (CPD) use is a key smoking behaviour indicator. It reflects smoking intensity which is directly proportional to the occurrence of tobacco induced cancers. Self reported CPD assessment in surveys may suffer from digit bias and under reporting. Estimates from such surveys could influence the policy decision for tobacco control efforts. In this context, this study aimed at identifying underlying factors of digit bias and its implications for Global Adult Tobacco Surveillance. MATERIALS OR METHODS: Daily manufactured cigarette users CPD frequencies from Global Adult Tobacco Survey (GATS)--India data were analyzed. Adapted Whipple Index was estimated to assess digit bias and data quality of reported CPD frequency. Digit bias was quantified by considering reporting of '0' or '5' as the terminal digits in the CPD frequency. The factors influencing it were identified by bivariate and logistic regression analysis.


RESULTS
The mean and mode of CPD frequency was 6.7 and 10 respectively. Around 14.5%, 15.1% and 15.2% of daily smokers had reported their CPD frequency as 2, 5 and 10 respectively. Modified Whipple index was estimated to be 226.3 indicating poor data quality. Digit bias was observed in 38% of the daily smokers. Heavy smoking, urban residence, North, South, North- East region of India, less than primary, secondary or higher educated and fourth asset index quintile group were significantly associated with digit bias.


DISCUSSION
The present study highlighted poor quality of CPD frequency data in the GATS-India survey and need for its improvement. Modeling of digit preference and smoothing of the CPD frequency data is required to improve quality of data. Marketing of 10 cigarette sticks per pack may influence CPD frequency reporting, but this needs further examination. Exploring alternative methods to reduce digit bias in cross sectional surveys should be given priority.


Introduction
Digit bias has been widely discussed in tobacco epidemiological survey for self-reporting of cigarette per day (CPD) frequency (Means et al., 1992;Klesges et al., 1995;IARC, 2008).Digit bias may occur due to rounding to multiples of a base unit (a convenient and significant number i.e. 5, 10 etc.) or due to terminal digit preference (or avoidance), resulting in data 'heaping' (Crocketta et al., 2001;IARC, 2008).The rounding of digits appears to round down more often than round up leading to underreporting of CPD use in tobacco surveys (Warner et al., 1978;Hatziandreu et al., 1989).
CPD is an important measure of smoking behaviour and is a key indicator for Global Adult Tobacco Surveillance (Giovino et al., 2012).Surveys are valid ways to estimate tobacco use prevalence (Caraballo et al., 2001;2004) but the accuracy of CPD data may suffer from digit bias

RESEARCH ARTICLE
Correlates of Digit Bias in Self-reporting of Cigarette per Day (CPD) Frequency: Results from Global Adult Tobacco Survey (GATS), India and its Implications Pratap Kumar Jena 1 *, Jugal Kishore 2 , G Jahnavi 3 like reporting of excess frequencies at round or preferred figures (Klesges et al., 1995;IARC, 2008).Digit bias in CPD frequency reporting could represent data quality and consistency (Means et al., 1992).
CPD measure is an integral part of all tobacco induced cancer mortality and morbidity (dose response) studies (Law et al., 1997;IARC, 2004).Cigarette per day is a classical measure of nicotine dependence and its severity (Dawe et al., 2002).Nicotine dependence assessment tools assess CPD frequency interval (<10, 11-20, 21-30 and >30) to denote severity of nicotine dependence (Heatherton et al., 1991;Chaiton et al., 2007).Digit bias would result in misclassification of high dependence as low dependence and could lead to suboptimal cessation therapy (Fagerstrom, 2003).Further the decision on treatment options is also dependent on frequency of CPD reported.For example nicotine replacement therapy is more suitable for those smokers who smokes ≥10 cigarette per day (West et al., 2000).The 2 mg nicotine gum is most suitable for smokers using <25 CPD and the 4 mg gum is most appropriate for smokers using ≥25 CPD (USDHHS, 2008;DGHS, 2011).Therefore rounding may affect medication prescription for smokers.In view of all these implications, more accurate measurement of CPD use is desirable.
The CPD use is being measured by subjective global self report, timeline follow-back (TLFB) method and real time ecological momentary assessment (EMA) (Shiffman, 2009;Berkman et al., 2011).Former two methods show four to six time higher incidence of digit bias than latter one (Shiffman et al., 2009).Also prospective CPD data collection (Perkins et al., 2012) has twice less digit bias than retrospective CPD data collection technique.These methods are time consuming and resource intensive.
India is home to 111 million smokers (IIPS, 2010).There is no information on the self reported CPD data quality in India.Therefore, it is pertinent to understand the CPD use data quality, magnitude of digit bias and factors influencing it.This study is aimed at quantifying and identifying correlates of digit bias while reporting CPD frequency in the GATS-India survey and to discuss its implication.

Materials and Methods
Global Adult Tobacco Survey (GATS) is nationally representative survey and was conducted in 2009-10 in India (IIPS, 2010).Item no (06a) in GATS questionnaire (On an average, how many manufactured cigarettes do you currently smoke each day?) was asked to daily smokers to get CPD frequency data.
For the purpose of this study we analyzed GATS-India data available in the public domain from the CDC website (CDC web data, 2012).A subsample of 3411 daily manufactured cigarette users' data was analyzed.The CPD frequency with last digit as '0' or '5' was assumed as digit biased.Hence CPD frequency was classified as digit biased or not digit biased.Bivariate analysis (using chi-square test) and binomial regression analysis was done to identify socio-demographic correlates of digit bias in CPD reporting.There is wide cultural variation in different regions of India.Therefore 'National regions' was used as proxy to represent such cultural variations.Principal component analysis was carried out to calculate asset index quintile groups.Considering low mean number of cigarette use in Indian population (Giovino et al., 2012), CPD use as ≥15 was defined as heavy user.As analyses were conducted on a non-random subsample, variance and sampling weights were not used.
Modified Whipple Index (WI) was calculated to assess heaping of terminal digits '0' and '5' in reporting of CPD.WI modification procedure adopted by Wang et al. (1995), Danic et al. (2004) and Shiffman et al. (2009) were followed to adapt it for validating CPD frequency data quality reported in the GATS survey.Adapted WI formula is given below for CPD frequency distribution between 3 and 17 considering occurrence of digit preference in multiples of 5. We have limited CPD frequency analysis for WI calculation, as 94% respondents reported the CPD frequency ≤17.

Results
In the GATS-India survey 3411 respondents reported daily cigarette use in between 1 and 110 with mean CPD as 6.7.The self reported CPD frequency among daily manufactured cigarette users had been represented in Figure 1.About 44.2% of the daily smokers reported CPD frequency as <5.The mode of CPD frequency distribution was 10 followed by 5 and 2. The CPD frequency as 2, 5, 10 was reported by 14.5%, 15.1% and 15.2% respectively.The figure clearly indicates heaping of CPD frequency around terminal digit '5' or '0'.Also heaping was observed around even number digits like 2, 8, 12 digits.The WI was calculated to be 226.28(95% CI: 216.45-236.10).Overall 38% (95% CI: 36.34-39.59%) of these respondents had shown digit bias.However, when CPD frequency <5 excluded and analyzed the digit bias increased to 67.9%.
Bivariate analysis (Table 1) indicated that male, urban and heavy user reported more digit bias than their respective counterparts which was found to be highly significant (P≤0.001).Among the national regions, respondents in the South and North East regions had shown higher digit bias than other regions.The digit bias was significantly different among the regions.Among the occupational groups, employed and students reported significantly more digit bias than other groups.There was clear and significant increase in digit bias with increase in educational level and asset index quintile groups was observed.Similar trend was observed in case of age but digit bias reporting was reduced after the age of 54 years.DOI:http://dx.doi.org/10.7314/APJCP.2013.14.6.3865Digit Bias in Cigarette per Day Frequency in GATS, India and its Implications However in the regression model, age, gender, occupation turned insignificant predictor of digit bias reporting.The model indicates that being a heavy user, urban resident, belonging to North/South/North East region of India, less than primary or secondary/above educated; and fourth asset index quintile group were significant predictors of digit bias reporting than non heavy users, rural resident, belonging to Central region of India, not formally educated and lowest asset quintile group respectively.

Discussion
Study results indicate that there is digit bias ('0' or '5') while reporting CPD frequency in the GATS India survey.About two in five daily smokers report digit bias with 10 as the most commonly reported CPD frequency.Whipple Index suggests quality of CPD frequency data of GATS-India survey is very rough with heaping at frequencies having terminal digits as '0' or '5'.Relation of heavy use, urban residence and national region with digit bias is well explainable.
The digit bias reported here indicates that self reported CPD use data is less accurate.Earlier Klesges et al. (1995) had observed 71% digit bias among the respondents and found higher odds of digit bias in CPD reporting among the heavy (CPD>20) user than light one.Same study also indicated duration of education (OR: 0.95) and being African or American (OR: 0.6) has protective role in digit bias while reporting CPD.In this study, education has no protective role in reporting biased CPD frequency, which needs further investigation.As 44.2% of daily cigarette smokers in this study reported CPD frequency less than five, the current definition, limited the probability of occurrence of higher digit bias.However, when smoker with CPD frequency under five is excluded, digit biased CPD frequency reporting increased to 67.9%.As two is the third most frequently reported figure for CPD use, digit preference other than '0' or '5' may need to be explored.
The self-report digit "spikes" or "heaping" at particular pattern (multiples of 5) indicates presence of information bias.However this "round digit" or "preferred digit" CPD frequency could also be true reporting.The highest peak at '10' in the reported CPD frequency could be because of availability of manufactured cigarettes in the form of '10 cigarettes' per pack in the market and consumption of one packet per day.A similar explanation was earlier provided by the Klesges et al. (1995) for frequent reporting of 20 as the preferred CPD consumption pattern in USA.While selling of loose cigarettes could promote higher consumption (Latkin et al., 2013), this form of marketing (ten cigarettes per pack) in India could compel smokers to consume all available cigarettes with him in a day.So this point directs towards a policy implication to reduce numbers of cigarettes per pack that may help to reduce   consumption among the smokers.However this hypothesis needs to be explored further.Subjective self-assessment through global questions depends on participants' abilities to remember and recall their own behavior for the course of 30 days which has been shown to be inaccurate owing to cognitive biases (Hammersley, 1994;Harris et al., 2009;Shiffman, 2009) which may lead to under reporting of CPD frequency (Warner et al., 1978;Pcchucek et al., 1984).In view of lack of 'Gold Standard', against which self-reports of CPD frequency could be evaluated (Shiffman, 2009), cross sectional survey need to be more careful about digit bias considering socio-cultural and socio-demographic factors.Saliva cotinine verified study by Dhavan et al. (2011) indicates that self-reports of tobacco use has a low sensitivity (36.3%) in Indian youths of 10-19 years old.ROC analysis of urinary cotinine levels in detection of self reported smoking by Balhara et al (2013) among adult psychiatric patients in North India, yielded area under curve (AUC) of 0.44, while true are is 0.5.Also being a socially undesirable behaviour, tobacco use is prone to under reporting (Fendrich et al., 2005).Hence the possibility of underreporting or misreporting in Indian context is high.Despite of higher possibility of under reporting in females in Indian context, Giovino et al. (2012) reported that mean CPD use among female Indian is higher than their male counterparts.It may be due to misreporting (Jena et al., 2012).The prevalence of higher digit bias among males than females and thus resultant more rounding down of CPD frequency among males may be an explanation for such unusual findings.Further exploration into definition and statistical analysis in GTAs is warranted.
In this study, 2, 5 and 10 have been most frequently reported CPD frequency.Therefore, special attention is required while registering such frequencies in the electronic data base generated though hand held device in the GATS.If necessary, further probing questionnaire may be asked.Since CPD data represents consumption pattern in the population, it could be verified from industrial (IARC, 2008) data, manufacturing data and sales data from government tax department.It is a crude method, but may form an alternative.Carlo et al. (2008) has modeled general pattern of digit preference and has tested it for optimal smoothing of various digit biased data (age at death, weight data etc.).Similar approach may also be used while reporting CPD frequency in large cross sectional surveys like GATS.The analysis of CPD frequency as a categorical data may reduce digit bias but under reporting and rounding down may miss classify the cigarette use.Examination of other methods of data collection which could ensure the accuracy of CPD data is warranted.
In conclusion, present study highlights poor data quality and high prevalence of digit bias in self reported CPD use frequency in India.Reduction of digit bias by innovative methods and statistical analysis should be given due importance in smoking behavioural surveys.The present study needs to be further expanded to investigate the nature of digit bias and the possibility of reduction of consumption of cigarettes among daily smokers by reducing cigarettes per pack available in the market

FigureFrequency
Figure 1.Self-Reported Cigarettes Per Day Frequency Data Showing Heaping

Table 1 . Digit Bias in Self Reporting of Frequency of Cigarette Per Day (CPD)
*Other means 'retired or unemployed'