8

I am working on a project that aims to retrieve a large data-set (i.e., tweet data which is a couple of days old) from Twitter using the twitteR library on R. have difficulty storing tweets because my machine has only 8 GB of memory. It ran out of memory even before I set it to retrieve for one day. Is there a way where I can store the tweets straight to my disk without storing into RAM? I am not using the streaming API as I need to get old tweets.

Steve Kallestad
  • 3,128
  • 4
  • 21
  • 39
Digital Dude
  • 181
  • 1

2 Answers2

5

Find a way to make your program write to disk periodically. Keep count of the number of tweets you grab and save after that number is high. I don't write R but psuedocode might look like:

$tweets = get_tweets(); $count = 0; $tweet_array = array(); for each ($tweets as $tweet) { $tweet_array += $tweet; $count++; if ($count > 10000) { append_to_file($tweet_array, 'file_name.txt'); clear_array($tweet_array); } }

sheldonkreger
  • 1,169
  • 8
  • 20
  • 1
    yes it might be possible in programming, but for R, the way it process data is quite differently. I am using the library twitteR from R and the min it an retrieve is one day. I am not sure how am I going to continue from the point I stop tweet if I ran the searchTweets func again. – Digital Dude Jun 19 '15 at 08:34
  • Wish I knew more about R to help you out. Sorry! – sheldonkreger Jun 19 '15 at 22:07
2

I worked on a Twitter data project last Fall wherein we used Java libraries to pull in tweet data from the streaming and the rest API's. We used Twitter4J (an unofficial Java library) for the Twitter API.

The tweet data was fetched and directly written onto text files on our hard drives. Yes, we did increase the memory and heap. I believe R studio will have a similar option. An alternative would be to pull in lesser amounts of tweet data with more number of repetitions.