We updated this dataset to include stories from November 2019 to July 2020.
Data has been crucial in the operation against COVID-19. It is being utilized across health, government, and finance sectors to track its spread, predict its impact, and help everybody stay informed. To try and help, in even the littlest way, AYLIEN is publishing a COVID-19 dataset that can be used to analyze global news throughout the outbreak.
Our Coronavirus dataset of over 1,500,000 news articles related to the pandemic published since the outbreak took place in late 2019. Each article has been aggregated and processed by the AYLIEN News API. The titles, body of text and URLs have been parsed and the content has been enriched using our NLP and metadata extraction models. These enrichments include entities recognized, topical category tags, sentiment, and summaries, as well as source information and published times.
What’s in the dataset?
- Size: 7.6 GB (1,673,353 news articles)
- Topic and themes: Coronavirus related news content
- Language: English content only
- Timeframe: Nov 2019 – July 2020
- Sources: ~440 global sources
For now, this is a one-shot dataset but we’re considering regularly updating the data on a weekly basis.
Below is a Timeseries view of the entire dataset.
We have also included a quick look at the content distribution by publisher below.
Who can use this data?
This dataset is aimed at data scientists, researchers, and hobbyists who wish to unearth insight and analysis from the world’s media by leveraging our NLP data enrichments. Anyone really who might find it useful.
We do have some stipulations around the use of the data that we need to make clear.
- The data should only be used in non-commercial projects. Please get in touch here if you’re unsure about your usage.
- Please don’t share the download link with others. If you want to share the data please direct people to the AYLIEN Coronavirus Dataset.
- Please use the following citation blurb: This data was aggregated, analyzed, and enriched by AYLIEN using AYLIEN’s News Intelligence Platform.
You can download your dataset here.
We’ve also released our Coronavirus News Dashboard, to help analysts and researchers who are looking to identify, track, and quantify the impact of the COVID-19 pandemic on the world economy and on their own business through real-time global news.
View our dashboard here.
Our only ask…
We’d love to hear about how you’re using our data and we encourage you to take part in sharing the findings. Please keep us informed about the work you’re doing and don’t hesitate to contact us if you have any questions or interesting findings.
For some inspiration check out some of the cool blogs we’ve published using variations of the same dataset.
- COVID-19: Measuring Industry Impact With Media Signals
- Coronavirus Goes Viral – Tracking the Outbreak via Global Media Signals
23 Nov, 2022
Surfacing Risk Signals for Financial Institutions Via the iDP Data Marketplace, by ORX
3 Min Read
18 Nov, 2022
10 years of AYLIEN, and here’s to many more
2 Min Read
17 Oct, 2022
Identifying causal links in NLP-enriched news data (with R code and dataset)
5 Min Read