4

It sounds like dump question but as a beginner, I'm really getting confuse.

For my academic thesis, I choose a conference paper on Big Data in Healthcare field. Now, problem is to get the Data sets.

I can't find any resources to download the data sets to work on it. When I google it, people suggested some resources which are really good but those file is not Big to call Big Data. I'm really confuse in this point and can't get right answer on this from anyone.

I was thinking, may be we work on small batch of data sets and further apply the models on Big Data in production sphere. However, it's just an assumption. I want to know, for thesis or analytical research in academic level, can we get real life Big Data sets to download for free?

Please correct me if I'm missing something. Thanks in advance.

Innat
  • 181
  • 8
  • Have you considered data augmentation techniques? – Green Falcon Jul 06 '18 at 17:51
  • 1
    Does your university have a medical department? Would it be possible to forge a collaboration? – The Lyrist Jul 06 '18 at 17:53
  • @Media No, I haven't consider data augmentation. I know of this but should I consider this at the first place ? – Innat Jul 06 '18 at 18:04
  • @TheLyrist No, my university haven't any medical department. – Innat Jul 06 '18 at 18:05
  • Actually, it highly depends on your dataset and the problem at hand. In situations that finding appropriate data costs much, data augmentation is necessary. – Green Falcon Jul 06 '18 at 18:06
  • @Media If acquiring Big Data pay high cost, then Data augmentation comes so handy to expand the data sets. But as I was following (above mention) a research paper, where they didn't consider data augmentation, so didn't I. – Innat Jul 06 '18 at 18:10
  • You may mail them and ask them to send you a link if the data is free. – Green Falcon Jul 06 '18 at 18:11
  • Hmm OK, I will contact them but is there aren't any online resources ? Only a torrent site called academic torrents where I saw some big volume of data though it's not legal. – Innat Jul 06 '18 at 18:18
  • 1
    please mention if you want to notify someone. There is already a community about datasets in stackexchange. I guess you can ask them or see here. – Green Falcon Jul 08 '18 at 00:34

1 Answers1

2

Take a look at https://goo.gl/yCZvSb, a view of the Gapminder health data sets. Of the 519, you may find one suitable.

For purposes of your thesis, I'd encourage you to think of big data in terms of high-dimensional data with n >> the typical clinical trial. I'd be very surprised if you found health-related observational data other than vital statistics, such as number of births/deaths that involved even 10^9 scale data.

Richard Careaga
  • 294
  • 1
  • 5