Most Popular

1500 questions
8
votes
3 answers

What to consider before learning a new language for data analysis

I'm currently in the very early stages of preparing a new research-project (still at the funding-application stage), and expect that data-analysis and especially visualisation tools will play a role in this project. In view of this I face the…
Patrick Allo
  • 191
  • 4
8
votes
2 answers

How to train ML model with multiple variables?

I am trying to learn Machine Learning concepts these days. I understand in a traditional ML data, we will have features and labels. I have following toy data in my mind where I have features like 'units_sold' and 'num_employees' and a label of…
Hannan
  • 183
  • 1
  • 1
  • 5
8
votes
2 answers

Filtering spam from retrieved data

I once heard that filtering spam by using blacklists is not a good approach, since some user searching for entries in your dataset may be looking for particular information from the sources blocked. Also it'd become a burden to continuously validate…
Rubens
  • 4,107
  • 5
  • 23
  • 42
8
votes
1 answer

Dealing with extreme values in softmax cross entropy?

I am dealing with numerical overflows and underflows with softmax and cross entropy function for multi-class classification using neural networks. Given logits, we can subtract the maximum logit for dealing with overflow but if the values of the…
RE60K
  • 183
  • 1
  • 4
8
votes
1 answer

How far can one go with excel?

in my business we handle all analytics through Excel. This includes mostly scheduling, production planning and accounting operations. We currently are looking into adding a bit of predictive modelling and Excel does suffice to a point, but doesn't…
Jcart
  • 340
  • 1
  • 6
8
votes
3 answers

What is parts of speech technique in sentiment analysis?

In an article, I saw Sentiment Analysis using Parts Of Speech(POS) technique. When I searched I got some paper on POS but I couldn't understand what POS basically is. Though I am new to sentiment analysis please help me to understand POS.
SRJ577
  • 197
  • 2
  • 4
  • 14
8
votes
2 answers

What is the difference between multi-layer perceptron and generalized feed forward neural network?

I'm reading this paper:An artificial neural network model for rainfall forecasting in Bangkok, Thailand. The author created 6 models, 2 of which have the following architecture: model B: Simple multilayer perceptron with Sigmoid activation function…
hyTuev
  • 277
  • 3
  • 9
8
votes
3 answers

How to compare experiments run over different infrastructures

I'm developing a distributed algorithm, and to improve efficiency, it relies both on the number of disks (one per machine), and on an efficient load balance strategy. With more disks, we're able to reduce the time spent with I/O; and with an…
Rubens
  • 4,107
  • 5
  • 23
  • 42
8
votes
5 answers

I got the following error : 'DataFrame' object has no attribute 'data'

I am trying to get the 'data' and the 'target' of the iris setosa database, but I can't. For example, when I load the iris setosa directly from sklearn datasets I get a good result: Program: from sklearn import datasets import numpy as np iris =…
user58187
  • 81
  • 1
  • 1
  • 2
8
votes
1 answer

Understanding the effect of num_words of Tokenizer in Keras

Consider the following code: from keras.preprocessing.text import Tokenizer tokenizer = Tokenizer(num_words = 5000) tokenizer.fit_on_texts(texts) print('Found %d unique words.' % len(tokenizer.word_index)) When I run this, it prints: Found 88582…
Mehran
  • 277
  • 1
  • 2
  • 12
8
votes
2 answers

What are the best practices to anonymize user names in data?

I'm working on a project which asks fellow students to share their original text data for further analysis using data mining techniques, and, I think it would be appropriate to anonymize student names with their submissions. Setting aside the…
xtian
  • 193
  • 1
  • 7
8
votes
1 answer

Does MLPClassifier (sklearn) support different activations for different layers?

According to the documentation, it says the 'activation' argument specifies: "Activation function for the hidden layer" Does that mean that you cannot use a different activation function in different layers?
DeLorean88
  • 215
  • 2
  • 4
8
votes
3 answers

Chi-square as evaluation metrics for nonlinear machine learning regression models

I am using machine learning models to predict an ordinal variable (values: 1,2,3,4, and 5) using 7 different features. I posed this as a regression problem, so the final outputs of a model are continuous variables. So an evaluation box plot looks…
Alex
  • 181
  • 2
8
votes
5 answers

Mastering NLP: Reading List

I've searched the web and there are hundreds of recommendations on what to read. The time moves on and new better quality techniques are published, so I would like to know what is relevant in 2018? My background is 4 years of BSc in Maths & Stats…
GRS
  • 183
  • 9
8
votes
1 answer

Resume Parsing - extracting skills from resume using Machine Learning

I am trying to extract a skill set of an employee from his/her resume. I have resumes stored as plain text in Database. I do not have predefined skills in this case. How should I approach this problem? I can think of two ways: Using unsupervised…
Sociopath
  • 1,243
  • 2
  • 12
  • 27
1 2 3
99
100