Tags
A tag is a keyword or label that categorizes your question with other, similar questions.
Machine Learning is a subfield of computer science that draws on elements from algorithmic analysis, computational statistics, mathematics, optimization, etc. It is mainly concerned with the use of data to construct models that have high predictive/forecasting ability. Topics include modeling building, applications, theory, etc.
11403 questions
Use for data science questions related to the programming language Python. Not intended for general coding questions (which should be asked on Stack Overflow).
6693 questions
a new area of Machine Learning research concerned with the technologies used for learning hierarchical representations of data, mainly done with deep neural networks (i.e. networks with two or more hidden layers), but also with some sort of Probabilistic Graphical Models.
4871 questions
Artificial neural networks (ANN), are composed of 'neurons' - programming constructs that mimic the properties of biological neurons. A set of weighted connections between the neurons allows information to propagate through the network to solve artificial intelligence problems without the network designer having had a model of a real system.
4368 questions
An instance of supervised learning that identifies the category or categories which a new instance of dataset belongs.
3281 questions
Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human–computer interaction. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation.
2740 questions
Keras is a popular, open-source deep learning API for Python built on top of TensorFlow and is useful for fast implementation. Topics include efficient low-level tensor operations, computation of arbitrary gradients, scalable computations, export of graphs, etc.
2722 questions
scikit-learn is a popular machine learning package for Python that has simple and efficient tools for predictive data analysis. Topics include classification, regression, clustering, dimensionality reduction, model selection, and preprocessing.
2308 questions
TensorFlow is an open source library for machine learning and machine intelligence. TensorFlow uses data flow graphs with tensors flowing along edges. For details, see https://www.tensorflow.org. TensorFlow is released under an Apache 2.0 License.
2183 questions
Time series are data observed over time (either in continuous time or at discrete time periods).
1885 questions
Techniques for analyzing the relationship between one (or more) "dependent" variables and "independent" variables.
1592 questions
A dataset is a collection of data, often in tabular or matrix form.
This tag is NOT intended for data requests ("where can I find a dataset about ...") --> see OpenData
1508 questions
R is a free, open-source programming language and software environment for statistical computing, bioinformatics, and graphics.
1485 questions
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval etc.
1396 questions
Convolutional Neural Networks (CNN, also called ConvNets) are a tool used for classification tasks and image recognition. The name giving first step is the extraction of features from the input data.
1367 questions
pandas is a python library for Panel Data manipulation and analysis, e.g. multidimensional time series and cross-sectional data sets commonly found in statistics, experimental science results, econometrics, or finance.
1340 questions
An activity that seeks patterns in large, complex data sets. It usually emphasizes algorithmic techniques, but may also involve any set of related skills, applications, or methodologies with that goal.
1181 questions
LSTM stands for Long Short-Term Memory. When we use this term most of the time we refer to a recurrent neural network or a block (part) of a bigger network.
1174 questions
Statistics is a scientific approach to inductive inference and prediction based on probabilistic models of the data. By extension, it covers the design of experiments and surveys to gather data for this purpose.
1129 questions
Methods and principles of selecting a subset of attributes for use in further modelling
974 questions
Questions mostly concerned with managing data, without focus on pre-processing or modelling.
866 questions
Random forest is a machine learning ensemble method based on choosing random subsets of observations and variables for each of many decision trees.
853 questions
A machine learning model is a simplified representation of a dataset, derived from statistics in the data, used to make predictions. It can represent patterns, behaviours or features within this dataset which have been learnt by the algorithm during training.
841 questions
Techniques for analyzing the relationship between one (or more) "dependent" variables and "independent" variables.
769 questions
Data cleaning is a preliminary step to statistical analysis in which the data-set is edited to correct errors and to put it into a form suitable for processing by statistical software.
762 questions
For questions about image classification: a decision problem where an algorithm must decide to which class ('cat', 'chair', 'tree') an input image belongs.
752 questions
A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm.
752 questions
A recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle.
744 questions
Pytorch is an open source library for Tensors and Dynamic neural networks in Python
with strong GPU acceleration. For details, see https://pytorch.org.
730 questions
A convolutional neural network is a form of neural network with an additional convolutional layer, typically used in image & audio analysis. The convolutional layer is essentially a filtering stage defined by the kernel which is used. For example, a convolutional layer could have a kernel which extracts edges from an image towards the goal of learning which objects are in a scene.
728 questions
Refers generally to statistical procedures that utilize the logistic function, most commonly various forms of logistic regression
706 questions
Constructing meaningful and useful graphical representations of data. (If your question is only about how to get particular software to produce a specific effect, then it is likely not on topic here.)
699 questions
Training is the part of machine learning whereby a model is "trained" on a define portion of a dataset to learn attributes and statistical features of the data. It's counterparts are called Testing and Validation. After training a model is tested and validated on another portion of the dataset.
694 questions
Questions about the organization of elements of data, and the standardization of their relations.
660 questions