Highest Voted 'feature-engineering' Questions

27

votes

3 answers

Encoding categorical variables using likelihood estimation

I am trying to understand how I can encode categorical variables using likelihood estimation, but have had little success so far. Any suggestions would be greatly appreciated.

feature-engineering

asked Apr 04 '16 at 09:31

small dwarf

271
1
3
4

4

votes

2 answers

2D matrix for labelbinarizer

There is one behavior of labelbinarizer import numpy as np from sklearn import preprocessing lb = preprocessing.LabelBinarizer() lb.fit(np.array([[0, 1, 1], [1, 0, 0]])) lb.classes_ The output is array([0, 1, 2]). Why there is a 2 there?

feature-engineering

asked Jan 28 '18 at 03:00

william007

775
1
10
20

3

votes

1 answer

Effect of Skewness and data range in machine learning

I have a feature for machine learning as follow that skew to the left, and only have number in certain number range (here 0-2000). Will skewness and range of number affect the learning? If yes what should I do?

feature-engineering

asked Feb 23 '17 at 04:01

user29151

3

votes

2 answers

numerical or categorical data

I have a feature for machine learning (using methods like SVM, naive bayes, neural network and random forest) called member duration as follows: Should I make it as numerical or categorical data?

feature-engineering

asked Feb 23 '17 at 03:51

william007

775
1
10
20

2

votes

1 answer

When is it appropriate to split a dataset on a categorical value and generate $n$ models instead?

When doing regression or classification when faced with a categorical attribute with $n$ possible values there are two options: Feed this attribute directly into your model. Partition your data into $n$ pieces based on the categorical attribute and…

feature-engineering

asked Jun 18 '20 at 07:41

orlp

121
2

1

vote

1 answer

Problem of finding best combination of features when desired feature is feature some_feature_A/some_feature_B

Problem is stated: we have giant csv file with one target column and rest are inputs, we don't know these features impact target but we would like to use algorithm that besides using linear and non-linear transformations will also take into account…

feature-engineering

asked Jul 11 '19 at 14:23

quester

295
1
3
8

1

vote

1 answer

How can I deal with circular features like hours?

Assume I want to predict if I'm fit in the morning. One feature is the last time I was online. Now this feature is tricky: If I take the hour, then a classifier might have a difficult time with it because 23 is numerically closer to 20 than to 0,…

feature-engineering

asked Oct 20 '17 at 13:44

Martin Thoma

18,880
35
95
169

1

vote

0 answers

Should original features be retained in the model after using them to engineer new features?

BACKGROUND: I have dataset that includes Race (e.g., White, Black) and Ethnicity (e.g., Hispanic, Non-Hispanic) as observed variables. The dataset also includes Race_Ethnicity (e.g., Hispanic White, Non-Hispanic Black) as an engineered variable,…

feature-engineering

asked Dec 09 '22 at 11:49

Snehal Patel

23
3

1

vote

1 answer

An efficient way of calculating/estimating frequency spectrum for an event

This is rather a practical question. I'm looking for an efficient way of calculating the frequency of an event for a large number of samples. Here's a more concrete example. Let's say that I have a system with millions of users. Each user has so…

feature-engineering

asked Jun 21 '22 at 02:41

Mehran

277
1
2
12

0

votes

1 answer

Cyclic dependency between feature and predictor class

I have a feature which has specific categorical values ex(Technology, Hardware, Software, Marketing, Evnts etc). Based on this and some other features, I am trying to classify the dataset into 2 categories IsSoftwareSystem or NotSoftwareSystem. In…

feature-engineering

asked Mar 10 '21 at 08:36

tumblewood

1

0

votes

1 answer

How to use feature group?

Let's say I have a data set like the following: file group_a_co_1 group_a_co_2 group_b_co_1 group_b_co_2 file_1 0.8 0.2 0.3 0.7 file_2 0.1 0.9 0.2 0.8 file_3 0.5 0.5 0.7 0.3 ... I wonder, whether there are ways/tricks to tell the…

feature-engineering

asked Dec 12 '19 at 11:46

dgg32

113
4

0

votes

1 answer

To One-Hot-Encode or not to One-Hot-Encode?

I have been struggling to find proof for that but I couldnt Every time I prepare dataset I face the same issue when a column is a classification such as CountryCode or TaskType in this dataset TaskType CountryCode Target 1 61 …

feature-engineering

asked Aug 13 '19 at 04:21

asmgx

549
2
18

0

votes

2 answers

What is a good approach for a lifespan?

Let's say I wan't to predict the lifespan of an ad in a listing. I know a bunch of thing from the ad like: the title the price the location etc The target value is the duration of the ad in the listing before it's being removed (item has been…

feature-engineering

asked Jul 19 '18 at 13:28

Benjamin Toueg

109
2

0

votes

0 answers

Features derived using retrocausality

I have been experimenting with features derived using retrocausality (not to be confused with data leakage) in training models. Are there any examples of prior work in the literature where this form of feature engineering has yielded success?

feature-engineering

asked May 21 '23 at 23:49

Adam Patterson

1

0

votes

1 answer

Are there any search algorithms for feature optimization similar to RFE, but which consider all possible combinations?

Does anyone know any good search algorithms for feature optimization that search through every possible combination to find the optimal combination of features for maximum predictive power? (Permutations are not important). So far I have been using…

feature-engineering

asked Nov 23 '21 at 18:57

PlatinumMaths

81
2
11

Questions tagged [feature-engineering]