Predicting Epidemics With Internet Search Data

Recent research suggests that evaluating desperate Google queries may be the best way to identify infectious disease outbreaks before they happen.

Before picking up the phone and making an appointment with a doctor, the fearful often hit the Internet. Google’s search sees rough descriptions of symptoms, wild guesses of the plague people think they’ve contracted, and perhaps a series of accompanying ominous phrases, along the lines of “AM I DYING???”

forthcoming review of recent research by epidemiologists in The Lancet Infectious Diseases argues that algorithms based on disease search terms have become so effective at predicting epidemics that they should be expanded and integrated more fully into more traditional estimates, usually based on physician and lab reports.

Several studies have successfully predicted the incidence of influenza and dengue fever by evaluating associated search terms on popular engines like Google and Yahoo!. In one study covered by the review, a search model for influenza accurately estimated incidence one to two weeks before the numbers appeared in regional reports produced by the Centers for Disease Control and Prevention.

That tool actually led to the development of Google Flu Trends and Google Dengue Trends, whose search-term reports have correlated strongly with incidence estimates in several public health reports in Europe, Asia, and the U.S. In one study, researchers used Google Flu Trends to generate “real-time” predictions of influenza alerts “with 97.7% accuracy” and “up to 4 weeks ahead of the release of CDC reports.” Another group of researchers plugged the numbers “into a humidity-driven compartmental mathematical model,” which spit out estimates for influenza highs “7 weeks in advance of their occurrence.”

The search-based models are much more effective than their more traditional counterparts because of their ability to capture people engaging in self-treatment. The authors write:

Internet-based surveillance systems circumvent the bureaucratic structure of traditional systems. Furthermore, they target a different section of the community to traditional surveillance systems. Zeng and Wagner’s model of patient behaviour during epidemics identifies four phases in health-care seeking: recognition of symptoms, interpretation of symptoms, representation of illness, and seeking treatment. Traditional surveillance systems only source data from people seeking treatment.

Of course, there are several limits to this approach. For one, terrible disease outbreaks often occur in countries where access to the Internet and other basic infrastructure is extremely limited. However, the authors note that the search model may still outcompete the traditional strategies: “Although the fraction of people assessable with an internet-based system in the average low-income country (30·7% internet access) is only 2993 people per 100000 patients, this fraction still exceeds that of a traditional surveillance system (750 people per 100 000 patients).” Still, these models will only capture the people that do turn to the Internet for “health-related information,” and this certainly varies across cultures.

Public health officials must also tailor search terms to each culture’s “nuances, language shifts, and use of colloquialisms or even memes,” the authors note. One research group even had to apply a “Bieber fever” filter. Hammering out the details for regions and even cities will be difficult, but the predictions, when combined with more traditional reports, could eventually help halt a major public health crisis.

Related Posts