Harnessing the power of social media for COVID-19 modelling

Prof Nicholas Mays
Social media data could hold the key to improving infectious disease models
Social media apps. Credit: Pexels/Tracy Le Blanc

The COVID-19 pandemic has sparked great interest in the mathematical models used to estimate disease transmission in the population. These models have figured prominently in the decisions of many governments, as they can help project the course of the disease, allocate people and resources, and  evaluate the impact of policies. But models - though undoubtedly valuable -  are not crystal balls; they are only as good as the available information.

This information may be found literally at our fingertips. With an increase of 13.2% in users in 2020 alone, social media platforms offer a growing wealth of potentially useful information on popular beliefs, opinions, intentions and behaviour. As a result, data from social media are potentially useful sources to help refine models.

Social media are more than passive observers and reporters of trends - they also purposively interact with and influence their users. Their interactive, immersive nature may initiate behaviour (think TikTok!) that does not happen when people consume traditional media such as television and newspapers. For example, social media may encourage “groupthink” where people’s desire to identify with the group gradually overrides their personal opinions.

Social media also encourage like-minded individuals increasingly to seek each other’s company. Consequently, users may reinforce and amplify each other’s opinions and attitudes - both positive and negative - to form ‘online echo chambers’. Though apparently harmless, these could potentially have a large impact on the future course of an outbreak. Recently, the conspicuous presence of the anti-vaxxers movement and vaccine hesitancy online has resulted in concerns about the likely levels of take-up of COVID-19 vaccination.  

Another unique feature is the impact of a relatively small number of ‘influencers’ who have the potential to alter the behaviour of large numbers of other people. Traditionally, the effects of media have been included in models by assuming that media reports (especially of the number of infected people) increase awareness, resulting in a lower risk of infection. But what if interaction with ‘influencers’ skeptical about the severity of the disease or the safety of vaccines has the opposite effect?  

Our study in the Bulletin of Mathematical Biology argues that the accuracy of infectious disease models, such as those used to inform COVID-19 policies, could be improved if we make more use of social media data, despite the challenges inherent in processing and interpreting such data. They can be difficult to process using traditional software because of their volume, and because they can take non-traditional forms such as audio, images, video and unstructured text.  But new methods are being developed that can handle and interpret these data mathematically for use in models.

Despite Orwellian reminders of ‘Big Brother’, researchers have already recognised the potential of social media data as an alternative data source for tracking public health trends alongside traditional surveillance data.

The frequency of mentions of posts with keywords like ‘covid19’, ‘pneumonia’ and ‘fever’, has been found to be strongly correlated with reports of the number of people infected. With each post, users leave a digital trace which may be linked to their demographic data and geographic location through geo-tagging.

On the surface, the almost instantaneous, open access availability of social media data seems like a real boon to modellers. It is tempting to make inferences regarding disease trends based solely on this information. However, though social media use is increasing as we are urged to “stay home, stay safe,” different demographic groups use social media in different ways and to different degrees.

Traditionally, social media platforms have been the preserve of younger people. And different platforms are more or less popular in different regions of the world. This means the data cannot be assumed to be fully representative of the general population, which will have a bearing on how the data can be interpreted and used.

With more than half of the world’s population currently using social media, now is an ideal time to rethink how we use the data they generate. During these uncertain times when information changes quickly, it is imperative that modellers use the most up to date information sources - social media data may be just the tool to accomplish this.


Sooknanan J, Mays NHarnessing social media in the modelling of pandemics – challenges and opportunitiesBulletin of Mathematical Biology. DOI: 10.1007/s11538-021-00895-3

Short courses

LSHTM's short and specifically designed courses provide the opportunity for intensive study in specialised topics.

These courses enable participants to refresh their skills and keep up to date with the latest research and knowledge in public and global health.