I need a set of news headlines and articles to help me in a project on automatic summarization. Is there such a dataset or something similar?
Asked
Active
Viewed 1,419 times
2
-
4This is better suited for https://opendata.stackexchange.com/ – oW_ Sep 10 '18 at 16:57
-
Take a look at here. – Green Falcon Sep 10 '18 at 17:06
-
1There's one with Medium.com Blog posts! – Aditya Sep 11 '18 at 06:31
1 Answers
2
The most widely used ones in text summarization research is the DUC dataset. If you see a paper using dataset "DUC 2015" or "DUC 2016" that's from here.
I have also personally used the Reuters arcihve. You just need to download each article with wget
or something similar. See also here.
The CNN / DailyMail dataset is also widely used in summarization especially in recent years, although it labels itself as a Q&A dataset.

user12075
- 2,264
- 14
- 19
-
How can you log in to get the DUC dataset? I can't figure out how I can register with my institution's email. – mosafattah Sep 12 '18 at 09:55
-
The link in the answer has instructions about how to request access to the dataset. Why not read it? – user12075 Sep 12 '18 at 14:26