Questions tagged [dataset]

A dataset is a collection of data, often in tabular or matrix form.

This tag is NOT intended for data requests ("where can I find a dataset about ...") --> see OpenData

A dataset, or data set, is a collection of data - the data points of which are typically related in some way.

Most commonly a dataset corresponds to the contents of a single database table, or a single statistical data matrix, where every column of the table represents a particular variable, and each row corresponds to a given member of the dataset in question. The dataset lists values for each of the variables, such as height and weight of an object, for each member of the dataset. Each value is known as a datum or data point.

The term dataset may also be used more loosely, to refer to the data in a collection of closely related tables.

1508 questions
14
votes
3 answers

Where can I download historical market capitalization and daily turnover data for stocks?

There are plenty of sources which provide the historical stock data but they only provide the OHLC fields along with volume and adjusted close. Also a couple of sources I found provide market cap data sets but they're restricted to US stocks. Yahoo…
tejaskhot
  • 4,065
  • 7
  • 20
  • 18
7
votes
2 answers

What kinds of data other than geographical are topologically spherical?

I'm trying to think of a data set that is essentially topologically spherical. It's easier to think of cylindrical datasets (two dimensions, one periodic) or toroidal datasets (two dimensions, both periodic). Obvious candidates are geographical and…
Toph
  • 171
  • 2
6
votes
4 answers

What's the point of the test set?

I get the point of a validation and training set, but the importance of a test set doesn't click for me. Let's say you train a model, and you try your best to avoid overfitting by testing your model on the validation set. After you've decided you…
Nick Corona
  • 113
  • 7
5
votes
3 answers

Loading collections of datasets - Python code examples

Sometimes you might want to check your ideas on multiple datasets. There are several places with datasets collections. Question: Please share some Python scripts how to download multiple datasets from these (or other) datasets collection ? Ideally…
5
votes
1 answer

What does BNG stand for

When i look at the available datasets in https://www.openml.org i often see a BNG dataset with no further information about it. Can someone explane what BNG means in this context? I am especially interested in this dataset:…
y4nnick
  • 53
  • 3
5
votes
2 answers

API for Company Data Enrichment Suggestions

I'm looking for API suggestions for enriching data on companies. Currently I use the Crunchbase API to look up a company's name or domain and I am trying to gather the domain/name (if I don't already have both), contact email (this one is a long…
ccanduc
  • 151
  • 1
4
votes
1 answer

Public datasets that show "cyclical" behavior

I am looking for any publicly available dataset that has a "cyclical" structure to it, in the sense that if I plot the data in a certain way, a loop becomes visible. A good example of this would be the Lotka-Volterra predator-prey model, which has…
Tom Solberg
  • 141
  • 1
4
votes
3 answers

Where can I find a social network image dataset?

I am supervising a programming project whose goal is to detect offensive images on social networks. I would like to have a representative dataset of social network images. It would be best if the dataset were already classified. Otherwise, the…
Zur Luria
  • 41
  • 6
4
votes
3 answers

Single Layer Perceptron with three classes

I need some help with a single layered perceptron with multiple classes. What I need to do is classify a dataset with three different classes, by now I just learnt how to do it with two classes, so I have no really a good clue how to do it with…
4
votes
1 answer

Having trouble installing and loading tidyverse- No DIB package

after a fresh (Control panel, Windows 7) uninstall, then re-install of Rstudio, I have tried to install and load tidyverse. I get the following message, seemingly because the package DBI is missing: install.packages("tidyverse") Installing…
jshea
  • 41
  • 1
  • 1
  • 2
3
votes
1 answer

Confused about description of YearPrediction Dataset

https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD According to the description given in the above link, the Attribute information specifies "average and covariance over all 'segments', each segment being described by a 12-dimensional timbre…
abhivij
  • 69
  • 1
  • 6
3
votes
2 answers

API for historical housing prices

I'm looking for an (ideally free) API that would have time series avg/median housing prices by zip code or city/state. Quandl almost fits the bill, but it returns inconsistent results across different zip codes and the data is not as up to date as…
3
votes
0 answers

global flight network (with number of pessengers per year) dataset

Is there such a global flight network dataset that gives you: flight route (the connection between each airports or cities by some flight company) the number of passengers (per year) taking each route. In other words the dataset is a directed…
xiaohan2012
  • 163
  • 4
3
votes
2 answers

Predicting hardware failures with limited data

I am exploring using machine learning to predict if a particular hardware component would fail within a timeframe, say 3 months. The ultimate goal is to minimize physical human inspection so that maintenance crew would always performance…
Koh
  • 141
  • 2
3
votes
2 answers

I want to add demographic data to a data set. Any suggestions on where to find zip code level data?

Ideally, I would like to add demographic features to my data. I have zip codes for each observation, so I was hoping to add in the demographic data by zip code. Unfortunately, I cannot find any demographic data by zip code. Does anyone have any…
george
  • 31
  • 1
1
2 3 4 5 6