28

As there are numerous tools available for data science tasks, and it's cumbersome to install everything and build up a perfect system.

Is there a Linux/Mac OS image with Python, R and other open-source data science tools installed and available for people to use right away? An Ubuntu or a light weight OS with latest version of Python, R (including IDEs), and other open source data visualization tools installed will be ideal. I haven't come across one in my quick search on Google.

Please let me know if there are any or if someone of you have created one for yourself? I assume some universities might have their own VM images. Please share such links.

VividD
  • 656
  • 7
  • 18
JeanVuda
  • 421
  • 4
  • 6
  • Although this question could be viewed as borderline offtopic I somehow find it a good one for the site IMHO. – Sean Owen Jan 23 '15 at 10:21
  • 3
    In addition to the awesome comments, there's a (somewhat older) blog post comparing several different solutions: http://jeroenjanssens.com/2013/12/07/lean-mean-data-science-machine.html – LauriK Jan 23 '15 at 13:29

5 Answers5

13

There is another choice which popular recently: docker(https://www.docker.com). Docker is a container and let you create/maintain a working environment very easily and fast.

Hope that would help you.

fansia
  • 578
  • 3
  • 9
12

If you are looking for a VM with a bunch of tools preinstalled, try the Data Science Toolbox.

Sean Owen
  • 6,595
  • 6
  • 31
  • 43
  • Interesting project (+1). Thank you for sharing! It might be easier to use it than to figure out why Docker didn't want to work on my Win 7 laptop (see above). However, it still might be a good idea to learn Docker, considering recent trends. – Aleksandr Blekh Jan 23 '15 at 10:45
  • Nice information. Comparing to vm tools, it needs some time to understand how docker operates. If you are already familiar with vm, it's a good idea to use this toolbox. Thank you for sharing. – fansia Jan 23 '15 at 15:55
  • Thank you for sharing. It is definitely interesting. But I don't see how someone can use it without a graphical interface. I would need R-studio, and PyCharm for Python.( iPython notebook is there). I will need to play with a bit to understand it completely. – JeanVuda Jan 24 '15 at 00:03
  • 1
    @AleksandrBlekh I was able to finally get docker to work on my Windows 7 machine by regenerating the certificates docker-machine regenerate-certs I hope that helps :) – R.K. Aug 20 '15 at 06:12
  • @R.K.: Thank you for letting me know. I will give it a try when I'll get a chance (it might take a while, though, as there are some higher priority matters waiting to be taken care of). – Aleksandr Blekh Aug 20 '15 at 09:11
8

While Docker images are now more trendy, I personally find Docker technology not user-friendly, even for advanced users. If you are OK with using non-local VM images and can use Amazon Web Services (AWS) EC2, consider R-focused images for data science projects, pre-built by Louis Aslett. The images contain very recent, if not the latest, versions of Ubuntu LTS, R and RStudio Server. You can access them here.

Besides main components I've listed above, the images contain many useful data science tools built-in as well. For example, the images support LaTeX, ODBC, OpenGL, Git, optimized numeric libraries and more.

Aleksandr Blekh
  • 6,518
  • 4
  • 28
  • 54
  • Thank you so much for mentioning this option. I will definitely give it a try. However, I want an image that has exactly like this AMI, but can be run with VirtualBox on my laptop. – JeanVuda Jan 23 '15 at 03:08
  • I watched a tutorial recently about Docker, tested it and found it easy to understand. What part did you find not user-friendly? – r_31415 Jan 23 '15 at 03:24
  • @JeanVids: You're very welcome. I understand your desire to have a local VM - that was the reason I've tried Docker on my computer. I will let you know, if I find a VirtualBox VM image focused on data science (hopefully, R-based). – Aleksandr Blekh Jan 23 '15 at 03:49
  • @RobertSmith: It is easy to understand until things don't work. I've tried to install and use Docker toolset some time ago, but I was unable to get it installed completely (used several tutorials, including the official one, plus lots of Internet searching, but to no avail). I don't remember exactly, but the issue was pretty technical, which resulted in inability to even run Docker virtualization software on my computer. – Aleksandr Blekh Jan 23 '15 at 03:54
  • @AleksandrBlekh Maybe Docker has been improving lately because I had very good experience. It was very easy to install (at least, on Linux) and I was able to test the main features without any issues. – r_31415 Jan 23 '15 at 04:13
  • 1
    @RobertSmith: I understand. Perhaps, the problem was that I was trying to set it up on my Windows machine. Anyway, I will give it a try some time later. Thanks for your comments. – Aleksandr Blekh Jan 23 '15 at 04:48
  • 1
    @AleksandrBlekh Yes, that might be the main problem. Unfortunately there are many issues when installing this sort of thing on Windows. – r_31415 Jan 23 '15 at 05:08
5

Did you try Cloudera's QuickStart VM?:

I found it very easy to run it and it includes open source software such as Mahout and Spark.

Emre Sevinç
  • 165
  • 2
  • 5
5

Today I used this repository and built it with docker. It is a docker image building spark based on Hadoop image of the same owner. If you to use spark, it has a python api called pyspark.

Ethan
  • 1,633
  • 9
  • 24
  • 39
Evren Kutar
  • 151
  • 2