Tags - Data Science Stack Exchange

machine-learning

Machine Learning is a subfield of computer science that draws on elements from algorithmic analysis, computational statistics, mathematics, optimization, etc. It is mainly concerned with the use of data to construct models that have high predictive/forecasting ability. Topics include modeling building, applications, theory, etc.

11403 questions

python

Use for data science questions related to the programming language Python. Not intended for general coding questions (which should be asked on Stack Overflow).

6693 questions

deep-learning

a new area of Machine Learning research concerned with the technologies used for learning hierarchical representations of data, mainly done with deep neural networks (i.e. networks with two or more hidden layers), but also with some sort of Probabilistic Graphical Models.

4871 questions

neural-network

Artificial neural networks (ANN), are composed of 'neurons' - programming constructs that mimic the properties of biological neurons. A set of weighted connections between the neurons allows information to propagate through the network to solve artificial intelligence problems without the network designer having had a model of a real system.

4368 questions

classification

An instance of supervised learning that identifies the category or categories which a new instance of dataset belongs.

3281 questions

nlp

Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human–computer interaction. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation.

2740 questions

keras

Keras is a popular, open-source deep learning API for Python built on top of TensorFlow and is useful for fast implementation. Topics include efficient low-level tensor operations, computation of arbitrary gradients, scalable computations, export of graphs, etc.

2722 questions

scikit-learn

scikit-learn is a popular machine learning package for Python that has simple and efficient tools for predictive data analysis. Topics include classification, regression, clustering, dimensionality reduction, model selection, and preprocessing.

2308 questions

tensorflow

TensorFlow is an open source library for machine learning and machine intelligence. TensorFlow uses data flow graphs with tensors flowing along edges. For details, see https://www.tensorflow.org. TensorFlow is released under an Apache 2.0 License.

2183 questions

time-series

Time series are data observed over time (either in continuous time or at discrete time periods).

1885 questions

regression

Techniques for analyzing the relationship between one (or more) "dependent" variables and "independent" variables.

1592 questions

dataset

A dataset is a collection of data, often in tabular or matrix form. This tag is NOT intended for data requests ("where can I find a dataset about ...") --> see OpenData

1508 questions

r

R is a free, open-source programming language and software environment for statistical computing, bioinformatics, and graphics.

1485 questions

clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval etc.

1396 questions

cnn

Convolutional Neural Networks (CNN, also called ConvNets) are a tool used for classification tasks and image recognition. The name giving first step is the extraction of features from the input data.

1367 questions

pandas

pandas is a python library for Panel Data manipulation and analysis, e.g. multidimensional time series and cross-sectional data sets commonly found in statistics, experimental science results, econometrics, or finance.

1340 questions

predictive-modeling

Statistical techniques used for predicting outcomes.

1193 questions

data-mining

An activity that seeks patterns in large, complex data sets. It usually emphasizes algorithmic techniques, but may also involve any set of related skills, applications, or methodologies with that goal.

1181 questions

lstm

LSTM stands for Long Short-Term Memory. When we use this term most of the time we refer to a recurrent neural network or a block (part) of a bigger network.

1174 questions

statistics

Statistics is a scientific approach to inductive inference and prediction based on probabilistic models of the data. By extension, it covers the design of experiments and surveys to gather data for this purpose.

1129 questions

feature-selection

Methods and principles of selecting a subset of attributes for use in further modelling

974 questions

data

Questions mostly concerned with managing data, without focus on pre-processing or modelling.

866 questions

random-forest

Random forest is a machine learning ensemble method based on choosing random subsets of observations and variables for each of many decision trees.

853 questions

machine-learning-model

A machine learning model is a simplified representation of a dataset, derived from statistics in the data, used to make predictions. It can represent patterns, behaviours or features within this dataset which have been learnt by the algorithm during training.

841 questions

linear-regression

Techniques for analyzing the relationship between one (or more) "dependent" variables and "independent" variables.

769 questions

data-cleaning

Data cleaning is a preliminary step to statistical analysis in which the data-set is edited to correct errors and to put it into a form suitable for processing by statistical software.

762 questions

image-classification

For questions about image classification: a decision problem where an algorithm must decide to which class ('cat', 'chair', 'tree') an input image belongs.

752 questions

decision-trees

A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm.

752 questions

rnn

A recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle.

744 questions

pytorch

Pytorch is an open source library for Tensors and Dynamic neural networks in Python with strong GPU acceleration. For details, see https://pytorch.org.

730 questions

convolutional-neural-network

A convolutional neural network is a form of neural network with an additional convolutional layer, typically used in image & audio analysis. The convolutional layer is essentially a filtering stage defined by the kernel which is used. For example, a convolutional layer could have a kernel which extracts edges from an image towards the goal of learning which objects are in a scene.

728 questions

logistic-regression

Refers generally to statistical procedures that utilize the logistic function, most commonly various forms of logistic regression

706 questions

xgboost

For questions related to the eXtreme Gradient Boosting algorithm.

701 questions

visualization

Constructing meaningful and useful graphical representations of data. (If your question is only about how to get particular software to produce a specific effect, then it is likely not on topic here.)

699 questions

training

Training is the part of machine learning whereby a model is "trained" on a define portion of a dataset to learn attributes and statistical features of the data. It's counterparts are called Testing and Validation. After training a model is tested and validated on another portion of the dataset.

694 questions

data-science-model

Questions about the organization of elements of data, and the standardization of their relations.

660 questions