Pros and cons of pandas or R for longitudinal data?

Question

Note: I believe this question is not off-topic because it meets all of the criteria for subjective questions that are allowed. I would be happy to rephrase or clarify if others disagree

I'm about to begin a two-year project which will predominantly involve longitudinal panel data. I've found numerous questions and answers (Pros and Cons of Python and R for Data Science), and blog posts (https://www.quora.com/Which-is-better-for-data-analysis-R-or-Python | http://www.kdnuggets.com/2015/05/r-vs-python-data-science.html), about the relative merits of R and Python+pandas for data science but nothing that discusses longitudinal data.

My question is therefore:

Which environment (Python+pandas or R) would you use for longitudinal data analysis, and why?

For example, I would love to see answers that:

Tell me how you have used one or both environments to solve a particular problem with longitudinal data
Which environment or package(s) you found easier to work with, and why
If you used a notebook such as beaker to work with multiple environments simultaneously.
If so, which environment did you use for which step(s) in the data analysis pipeline, and why?
If so, did this confer advantages over just using one language: would you use such an approach again?

I am not asking which one is best (we all know that such questions are never constructive).

I am more familiar with one environment than the other, but I'm not averse to learning new skills (and both share similar syntax anyway), so I'm not going to say which one: I want answers based on your experience, not my abilities.

Ok, why the downvote? I knew this would be a controversial question, but how do I improve it/make it more relevant? E.g. how does this question get 27 upvotes when it's much less specific? — Phil, May 15 '17 at 10:45

score 2 · Answer 1 · answered May 15 '17 at 11:47

Let me answer this from the point of data in general. I understand you want the answer for longitudinal data but the reason I don't want to specifically answer that is because pretty much all these frameworks treat data in similar ways.

What you should be looking for?

1. How proficient are you in coding?

If you code well enough and you can tackle most problems by coding the I suggest Python + Pandas. The reason is simple since Python is a programming language you can use the language outside just the normal ecosystem of running your task. So if you want to integrate other components such as maybe a web server, store in a database etc. Python will be most useful. where as if you are not comfortable with coding then I suggest R is the way to go since it is much more easy to learn but the disadvantage comes when you are trying to do things other than just perform your daily data analysis. It might be limiting

2. In terms of efficiency

Now pretty much all these tools work pretty fast and the differences are hardly noticeable. Usually, you would have a better edge with R in just the way computations are performed. It is a bit more optimized and it utilizes CPU cores to perform a tad bit faster computation than python does. However these days with the heavy intensive RAM etc it is not really that big of a difference. Also, most libraries for heavy matrix calculations are present in both these toolkits. So not really much difference.

In my opinion

I would go for Python just because of the diverse uses it has. Also has a great supporting environment for libraries etc.

Thank you for taking the time to answer. So, for python you would just use the pandas library to set up and store your longitudinal data? Have you found this the most straightforward approach for this kind of data? — Phil, May 15 '17 at 11:57
Both Python and R have the concept of data frames which is a unique way of interacting with data. I implore you to go read more about it. Reiterating what I said the definition of "straightforward" is dependent on how comfortable you are with language. R is more concise syntax-wise but I would prefer Python just for its flexibility. Also, Python isn't that hard to learn so. — user-116, May 15 '17 at 14:08

score 0 · Answer 2 · answered May 16 '17 at 09:54

Answers coming in from twitter

R + packages

@philmikejones R all the way! (both are good lets be honest, so just follow what you have greater knowledge of)

— Mark Green (@markalangreen) May 15, 2017

@markalangreen Have you used a specific package for setting up your longitudinal data, or just base R?

— Phil (@philmikejones) May 15, 2017

@philmikejones Depends on what you are doing but I tend to stick to data.table - I don't think there is that one 'golden' package

— Mark Green (@markalangreen) May 15, 2017

@markalangreen @philmikejones agree - in my experience people using the forecast package by @robjhyndman usually use ts(), otherwise zoo or xts objects

— andrea panizza (@unsorsodicorda) May 15, 2017

So, recommendations:

data.table (which I'm not a big fan of, but tidyverse is fine as my data isn't 'big')
forecast package, particularly ts()
zoo or xts objects

Prophet

@philmikejones What abt Prophet? Based on @mcmc_stan, available in R & Python, didn't use much but got great results when I did: https://t.co/Pqat2KevCj

— andrea panizza (@unsorsodicorda) May 15, 2017

https://research.fb.com/prophet-forecasting-at-scale/

Pros and cons of pandas or R for longitudinal data?

2 Answers2

R + packages

Prophet

Linked