2

I need a set of news headlines and articles to help me in a project on automatic summarization. Is there such a dataset or something similar?

mosafattah
  • 21
  • 2

1 Answers1

2

The most widely used ones in text summarization research is the DUC dataset. If you see a paper using dataset "DUC 2015" or "DUC 2016" that's from here.

I have also personally used the Reuters arcihve. You just need to download each article with wget or something similar. See also here.

The CNN / DailyMail dataset is also widely used in summarization especially in recent years, although it labels itself as a Q&A dataset.

user12075
  • 2,264
  • 14
  • 19