2

What are the best known Data Science Methodologies today? By methodology I mean a step-by-step phased process that can be used for framing guidance, although I will be grateful for something close too.

To help clarify, there are methodologies in the programming world, like Extreme Programming, Feature Driven Development, Unified Process, and many more. I am looking for their equivalents, if they exist.

A google search did not turn up much, but I find it hard to believe there is nothing out there. Any ideas?

Mike Wise
  • 233
  • 2
  • 11
  • Can you be a lot more specific? what do you have in mind when you ask about methodologies? modeling, scoring, evaluation? – Sean Owen Feb 15 '15 at 17:36
  • Specifically I mean a step-by-step phased process that one can use for framing guidance. But I am interested in anything close to that too. – Mike Wise Feb 15 '15 at 17:40
  • I edited it to clarify what I meant. Can you take me off hold now? – Mike Wise Feb 15 '15 at 17:48
  • It is clear what a methodology is but the topic is still quite broad. Are you talking about approaches to modeling? Feature selection? visualization? – Sean Owen Feb 15 '15 at 17:52
  • No, it is to solve real life business problems, which is of course driving the boom in Data Science. I am not so interested in pure academic applications - although they are fun, they do not require a methodology usually.

    Do I need to state that too?

    – Mike Wise Feb 15 '15 at 17:55
  • To help clarify, there are methodologies in the programming world, like Extreme Programming, Feature Driven Development, Unified Process, and many more. I am looking for their equivalents, if they exist. – Mike Wise Feb 15 '15 at 18:02

4 Answers4

2

Can you elaborate what you mean by 'methodologies'?

In the meantime, take a look at The Field Guide To Data Science by Booz Allen Hamilton. This guide talks about data science processes and frameworks.

Data Science Design Patterns by Mosaic talks about, you guessed it, data science design patterns. This is quite useful to get a sense of common design patterns. They are also working on releasing a book on the same subject.

Then there are several resources out there that will come up as results to more targeted searches, such as machine learning paradigms, recommender systems paradigms, etc. Data Science is a large and varied field, and you'll find many resources out there for each subsection of it. As far as I know, there isn't one book that covers it all.

saq7
  • 400
  • 2
  • 5
  • That seems promising, I will have a look at those. As for methodology, I mean a step-by-step phased process that one can use for framing guidance. – Mike Wise Feb 15 '15 at 17:39
  • Yeah check out the resources I linked, the first one talks about a very high level methodology. But once you have your high level pieces figured out, you need to look for process for each of your sub processes. – saq7 Feb 15 '15 at 17:48
  • +1. The Field Guide is a nice high-level overview. – SmallChess May 11 '15 at 15:42
1

Following are the most popular Data Science Methodologies: Data Collection Data Preparation Data Modeling Cross-Industry Standard Process for Data Mining (CRISP-DM): Business Understanding Data Understanding Evaluating Deployment Knowledge Discovery in Databases Linear regression

JiahMehra
  • 11
  • 2
0

I'm currently writing a book about Data Science in Higher Education, and the following methodologies are the ones I am including:

For regression, we have:

  • Simple Linear Regression
  • Multiple Linear Regression

For classification, we have:

  • Naive Bayes Classifier
  • Decision Tree Induction
  • K-Nearest Neighbor

These are some of the more elementary topics in statistical analysis (which you could argue is predictive analytics which you could argue is data science), and thus I would suspect they are also the more common.

Jesse
  • 216
  • 1
  • 2
  • 2
    I wouldn't call these methodologies though. They are algorithms, that can be applied as part of a methodologie. – Mike Wise Feb 17 '15 at 05:17
0

Okay, I eventually found what I was looking for in the Data Mining Community. There seem to be two candidates, CRISP-DM which comes from SPSS originally but is "Cross-Industry", and SEMMA which comes from SAS. They are both pretty much what I was looking for.

CRISP-DM http://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining

And

SEMMA http://en.wikipedia.org/wiki/SEMMA

Mike Wise
  • 233
  • 2
  • 11
  • 'Process Flow' is a better term for what you want in this case. Methodology might refer to a method that you apply to solve a problem. Example: Decision Tree for Classification. – Minu Feb 06 '17 at 20:22
  • Also, in my opinion, a data scientist's process flow is something unique to that data scientist. CRISP-DM and SEMMA are definitely a good place to start. They can work as base templates for you to create your own process flow. – Minu Feb 06 '17 at 20:25