Epidemics inevitably result in a large number of deaths and always cause considerable social and economic damage. Epidemic surveillance has thus become an important healthcare research issue. In 2009, Ginsberg et al. observed that the query logs of search engines can be used to estimate the status of epidemics in a timely manner. In this paper, we model epidemic surveillance as a classification problem and employ query statistics from Google to classify the status of a dengue fever epidemic. The query logs of twenty-three dengue-related keywords serve as observations for machine learning and testing, and a number of machine learning models are investigated to evaluate their surveillance performance. Evaluations based on a 5-year real world dataset demonstrate that search engine query logs can be used to construct accurate epidemic status classifiers. Moreover, the learned classifiers generally outperform conventional regression approaches. We also apply various machine learning models, including generative, discriminative, sequential, and non-sequential classification models, to demonstrate their applicability to epidemic surveillance.
PACIS 2010 - 14th Pacific Asia Conference on Information Systems