5

I have created a dataset which has rather large number of features for example-100,000. Is it too large for a decent computer to handle ( I have a 1080ti )?

Green Falcon
  • 14,058
  • 9
  • 57
  • 98
Mahmud Sabbir
  • 153
  • 1
  • 3
  • 2
    Depends on a lot of factors like the data, network, network architecture, type of features, the features themselves, etc. So, in this form, your question is very broad – Dawny33 Jan 11 '18 at 10:36
  • Also, 'Is there a maximum limit to the number of features in a Neural Network?' The answer would be: Theoretically, No. Practically, depends on the computational power you can afford. – Dawny33 Jan 11 '18 at 10:36
  • 1
    Could you add number of examples in your training set. This has a large impact on what is reasonable. This is not a limitation of NNs, and not a strict rule/relationship (factors that Dawny33 notes are also important), however if you have 10k training examples and each example has 100k features, you may have to be very careful (for instance CNNs and RNNs partly get around this issue due to repeated nature of features e.g. pixels in an image get treated much the same, but also they do require a lot of training data). – Neil Slater Jan 11 '18 at 11:05
  • Thank you- Dawny and Neil. I have a training example of 4k but I also have a large number of classes (i.e 100). I have some idea about CNN but did not understand how can CNN solve this. Is it due to the average pool and maxpooling? – Mahmud Sabbir Jan 11 '18 at 15:15
  • In practice, it's always limited to memory limitations. But it's possible to work with, e.g. ImageNet images are 226x226x3, which is 153k 32-bit features per instance. Of course, the batch size has to be pretty small, but it is possible to train a CNN on a 1080ti. – Maxim Jan 12 '18 at 20:49

1 Answers1

5

It highly depends on your data. If it's image, I guess it is somehow logical but if not I recommend you constructing covariance matrix and tracking whether features have correlation or not. If you see many features are correlated, it is better to discard correlated features. You also can employ PCA to do this. Correlated features cause larger number of parameters for neural network.

Also I have to say that maybe you can reduce the number of parameters if your inputs are images by resizing them. In popular nets the length and height of input images are usually less than three hundred which makes the number of input features 90000. Also you can employ max-pooling after some convolution layers, if you are using convolutional nets, to reduce the number of parameters. Refer here which maybe helpful.

Green Falcon
  • 14,058
  • 9
  • 57
  • 98