xgboost: give more importance to recent samples

Question

Is there a way to add more importance to points which are more recent when analyzing data with xgboost?

score 43 · Answer 1 · answered Dec 23 '15 at 00:38

43

Just add weights based on your time labels to your xgb.DMatrix. The following example is written in R but the same principle applies to xgboost on Python or Julia.

data <- data.frame(feature = rep(5, 5),
                   year = seq(2011, 2015), 
                   target = c(1, 0, 1, 0, 0))
weightsData <- 1 + (data$year - max(data$year)) * 5 * 0.01

#Now create the xgboost matrix with your data and weights
xgbMatrix <- xgb.DMatrix(as.matrix(data$feature), 
                         label = data$target, 
                         weight = weightsData)

answered Dec 23 '15 at 00:38

wacax

3,390
4
23
45

Thanks for your answer - its really helpful to see a coded example. How does the magnitude of the weighting function coefficients affect the model? I looked through xgboost docs, but I can't find information about the significance of these numerical values. – kilojoules Dec 23 '15 at 19:29
didn't know this trick, nice. there's a little tidbit in the xgboost doc under the function setinfo(), though its not very descriptive – TBSRounder Dec 24 '15 at 15:39

lucidyan · Answer 2 · 2017-07-18T12:21:52.893

23

On Python you have a nice scikit-learn wrapper, so you can write just like this:

import xgboost as xgb
exgb_classifier = xgb.XGBClassifier()
exgb_classifier.fit(X, y, sample_weight=sample_weights_data)

More information you can receive from this: http://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBClassifier.fit

edited Jul 18 '17 at 12:21

answered Sep 09 '16 at 14:37

lucidyan

331
2
5

Wish for R caret had this built in too.. – pauljeba Jul 03 '17 at 13:16
1

that should be xgb.XGBClassifier() in the second line of code but stackexchange does not allow edits of less than six characters... – Andre Holzner Jul 18 '17 at 10:05

score 15 · Accepted Answer · answered Dec 22 '15 at 17:26

15

You could try building multiple xgboost models, with some of them being limited to more recent data, then weighting those results together. Another idea would be to make a customized evaluation metric that penalizes recent points more heavily which would give them more importance.

answered Dec 22 '15 at 17:26

TBSRounder

883
6
12

9

The OP can simply give higher sample weights to more recent observations. Most packages allow this, as does xgboost. – Ricardo Cruz Aug 11 '17 at 08:55

xgboost: give more importance to recent samples

3 Answers3

Linked