TY - JOUR
T1 - Using interviewer random effects to remove
selection bias from HIV prevalence estimates
AU - McGovern, Mark E
AU - Bärnighausen, Till
AU - Salomon, Joshua A
AU - Canning, David
PY - 2015/2/5
Y1 - 2015/2/5
N2 - Background: Selection bias in HIV prevalence estimates occurs if non-participation in testing is correlated with
HIV status. Longitudinal data suggests that individuals who know or suspect they are HIV positive are less likely
to participate in testing in HIV surveys, in which case methods to correct for missing data which are based on
imputation and observed characteristics will produce biased results.
Methods: The identity of the HIV survey interviewer is typically associated with HIV testing participation, but is
unlikely to be correlated with HIV status. Interviewer identity can thus be used as a selection variable allowing
estimation of Heckman-type selection models. These models produce asymptotically unbiased HIV prevalence
estimates, even when non-participation is correlated with unobserved characteristics, such as knowledge of HIV
status. We introduce a new random effects method to these selection models which overcomes non-convergence
caused by collinearity, small sample bias, and incorrect inference in existing approaches. Our method is easy to
implement in standard statistical software, and allows the construction of bootstrapped standard errors which
adjust for the fact that the relationship between testing and HIV status is uncertain and needs to be estimated.
Results: Using nationally representative data from the Demographic and Health Surveys, we illustrate our approach
with new point estimates and confidence intervals (CI) for HIV prevalence among men in Ghana (2003) and Zambia
(2007). In Ghana, we find little evidence of selection bias as our selection model gives an HIV prevalence estimate
of 1.4% (95% CI 1.2% – 1.6%), compared to 1.6% among those with a valid HIV test. In Zambia, our selection model
gives an HIV prevalence estimate of 16.3% (95% CI 11.0% - 18.4%), compared to 12.1% among those with a valid
HIV test. Therefore, those who decline to test in Zambia are found to be more likely to be HIV positive.
Conclusions: Our approach corrects for selection bias in HIV prevalence estimates, is possible to implement even
when HIV prevalence or non-participation is very high or very low, and provides a practical solution to account for
both sampling and parameter uncertainty in the estimation of confidence intervals. The wide confidence intervals
estimated in an example with high HIV prevalence indicate that it is difficult to correct statistically for the bias that
may occur when a large proportion of people refuse to test.
AB - Background: Selection bias in HIV prevalence estimates occurs if non-participation in testing is correlated with
HIV status. Longitudinal data suggests that individuals who know or suspect they are HIV positive are less likely
to participate in testing in HIV surveys, in which case methods to correct for missing data which are based on
imputation and observed characteristics will produce biased results.
Methods: The identity of the HIV survey interviewer is typically associated with HIV testing participation, but is
unlikely to be correlated with HIV status. Interviewer identity can thus be used as a selection variable allowing
estimation of Heckman-type selection models. These models produce asymptotically unbiased HIV prevalence
estimates, even when non-participation is correlated with unobserved characteristics, such as knowledge of HIV
status. We introduce a new random effects method to these selection models which overcomes non-convergence
caused by collinearity, small sample bias, and incorrect inference in existing approaches. Our method is easy to
implement in standard statistical software, and allows the construction of bootstrapped standard errors which
adjust for the fact that the relationship between testing and HIV status is uncertain and needs to be estimated.
Results: Using nationally representative data from the Demographic and Health Surveys, we illustrate our approach
with new point estimates and confidence intervals (CI) for HIV prevalence among men in Ghana (2003) and Zambia
(2007). In Ghana, we find little evidence of selection bias as our selection model gives an HIV prevalence estimate
of 1.4% (95% CI 1.2% – 1.6%), compared to 1.6% among those with a valid HIV test. In Zambia, our selection model
gives an HIV prevalence estimate of 16.3% (95% CI 11.0% - 18.4%), compared to 12.1% among those with a valid
HIV test. Therefore, those who decline to test in Zambia are found to be more likely to be HIV positive.
Conclusions: Our approach corrects for selection bias in HIV prevalence estimates, is possible to implement even
when HIV prevalence or non-participation is very high or very low, and provides a practical solution to account for
both sampling and parameter uncertainty in the estimation of confidence intervals. The wide confidence intervals
estimated in an example with high HIV prevalence indicate that it is difficult to correct statistically for the bias that
may occur when a large proportion of people refuse to test.
U2 - 10.1186/1471-2288-15-8
DO - 10.1186/1471-2288-15-8
M3 - Article
SN - 1471-2288
VL - 15
JO - BMC Medical Research Methodology
JF - BMC Medical Research Methodology
IS - 8
ER -