Questions tagged [machine-learning]

Machine Learning is a subfield of computer science that draws on elements from algorithmic analysis, computational statistics, mathematics, optimization, etc. It is mainly concerned with the use of data to construct models that have high predictive/forecasting ability. Topics include modeling building, applications, theory, etc.

What is Machine Learning?

Machine Learning is a subfield of computer science that draws on elements from algorithmic analysis, computational statistics, mathematics, optimization, etc. It is mainly concerned with the use of data to construct models that have high predictive/forecasting ability. Topics include modeling building, applications, theory, etc. Machine Learning is often used on large datasets to help draw predictive relationships between underlying features within the data.

Modern applications of Machine Learning are wide ranging including those in Bioinformatics, Astronomy, Computational Physics, Economics, Natural Language Processing, Image Recognition/Object Detection, Robotics, Recommendation Systems, etc.


Tag usage

When posting questions about Machine Learning, please make sure to take the following into consideration:

  • All questions should include both sufficient detail and clarity to be able to solve the problem at hand. This includes links to original data sources, code used for model construction, links to tutorials/other resources used, etc.

  • Questions should generally be more specific than "which model should I use" or "how can I achieve this" and explain what has been attempted/done so far.

  • Unless directly related to the problem, all questions regarding where to get data (sources, APIs, datasets, etc.) should not be posted on Stack Exchange Data Science, but rather on: Open Data Stack Exchange.


Types

Please see below for a (non-exhaustive) list of the types of Machine Learning:


External Resources


Machine Learning Journals

11403 questions
81
votes
9 answers

Data scientist vs machine learning engineer

What are the differences, if any, between a "data scientist" and a "machine learning engineer"? Over the past year or so "machine learning engineer" has started to show up a lot in job postings. This is particularly noticeable in San Francisco,…
Ryan Zotti
  • 4,149
  • 3
  • 19
  • 32
17
votes
5 answers

Detecting cats visually by means of anomaly detection

I have a hobby project which I am contemplating committing to as a way of increasing my so far limited experience of machine learning. I have taken and completed the Coursera MOOC on the topic. My question is with regards to the feasibility of the…
Frost
  • 273
  • 2
  • 5
14
votes
4 answers

Studying machine learning algorithms: depth of understanding vs. number of algorithms

Recently I was introduced to the field of Data Science (its been 6 months approx), and Ii started the journey with Machine Learning Course by Andrew Ng and post that started working on the Data Science Specialization by JHU. On practical application…
Vinay Tiwari
  • 151
  • 4
14
votes
1 answer

Machine learning libraries for Ruby

Are there any machine learning libraries for Ruby that are relatively complete (including a wide variety of algorithms for supervised and unsupervised learning), robustly tested, and well-documented? I love Python's scikit-learn for its incredible…
the911s
  • 321
  • 1
  • 8
13
votes
1 answer

Source of Arthur Samuel's definition of machine learning

Many people seem to agree that Arthur Samuel wrote or said in 1959 that machine learning is the "Field of study that gives computers the ability to learn without being explicitly programmed". For example the quote is contained in this page, that…
Pierre Cattin
  • 263
  • 1
  • 2
  • 6
13
votes
1 answer

1% of data for training 99% of data for testing

I got feedback from a reviewer. It is really important for me to answer to this question. I would appreciate of any help. it was mentioned that 1% of the data was used for training while 99% was used for testing. This is unusual and it calls for…
Ahmad Turani
  • 239
  • 2
  • 7
12
votes
2 answers

Difference between a target and a label in machine learning

If I have a supervised learning system (for example for the MNIST dataset) I have features (pixel values of MNIST data) and labels (correct digit-value). However sometimes people use the word target (instead of label). Are target and label…
Niklas
  • 223
  • 1
  • 2
  • 6
12
votes
9 answers

What are some easy to learn machine-learning applications?

Being new to machine-learning in general, I'd like to start playing around and see what the possibilities are. I'm curious as to what applications you might recommend that would offer the fastest time from installation to producing a meaningful…
Steve Kallestad
  • 3,128
  • 4
  • 21
  • 39
12
votes
8 answers

Definition of a model in machine learning

This definition does not quite apply since we are not always assuming an underlying distribution. So what is a model really? Can a Gradient Boosted Model (GBM) with specified hyperparameters be considered a model? Is a model a collection of rules?
organic agave
  • 221
  • 1
  • 2
  • 4
11
votes
2 answers

Amplifying a Locality Sensitive Hash

I'm trying to build a cosine locality sensitive hash so I can find candidate similar pairs of items without having to compare every possible pair. I have it basically working, but most of the pairs in my data seem to have cosine similarity in the…
Philip Pearl
  • 251
  • 1
  • 5
11
votes
1 answer

Lazy vs Eager Learning

I wish to better understand the difference between lazy and eager learning. I am having difficulty conceptualising what the "abstraction" refers to between the two. According to the text book I am reading it says, "The distinction between easy…
TheGoat
  • 271
  • 1
  • 2
  • 6
11
votes
4 answers

Why not train the final model on the entire data after doing hyper-paramaeter tuning basis test data and model selection basis validation data?

By entire data I mean train + test + validation Once I have fixed my hyperparameter using the validation data, and choose the model using the test data, won't it be better to have a model trained on the entire data so that the parameters are better…
Apoorva Abhishekh
  • 195
  • 1
  • 3
  • 8
11
votes
3 answers

Is TensorFlow a complete Machine Learning Library?

I am new to TensorFlow and I need to understand the capabilities and shortcomings of TensorFlow before I can use it. I know that it is a deep learning framework, but apart from that which other machine learning algorithms can we use with tensor…
Swaroop
  • 213
  • 1
  • 2
  • 6
10
votes
2 answers

Which database to use for storing machine learning data?

I am currently storing my training data into HDF5 files and I want my team and I to switch for a database for two main reasons: the data is not used only by me and the different datasets are stored in different folders at different paths etc so I…
ava_punksmash
  • 241
  • 1
  • 5
10
votes
1 answer

Prediction with non-atomic features

I would like to use non-atomic data, as a feature for a prediction. Suppose I have a Table with these features: - Column 1: Categorical - House - Column 2: Numerical - 23.22 - Column 3: A Vector - [ 12, 22, 32 ] - Column 4: A Tree - [ [ 2323, 2323…
user3798928
  • 101
  • 3
1
2 3
29 30