35

Is there a way to add more importance to points which are more recent when analyzing data with xgboost?

kilojoules
  • 453
  • 1
  • 4
  • 6

3 Answers3

43

Just add weights based on your time labels to your xgb.DMatrix. The following example is written in R but the same principle applies to xgboost on Python or Julia.

data <- data.frame(feature = rep(5, 5),
                   year = seq(2011, 2015), 
                   target = c(1, 0, 1, 0, 0))
weightsData <- 1 + (data$year - max(data$year)) * 5 * 0.01

#Now create the xgboost matrix with your data and weights
xgbMatrix <- xgb.DMatrix(as.matrix(data$feature), 
                         label = data$target, 
                         weight = weightsData)
wacax
  • 3,390
  • 4
  • 23
  • 45
  • Thanks for your answer - its really helpful to see a coded example. How does the magnitude of the weighting function coefficients affect the model? I looked through xgboost docs, but I can't find information about the significance of these numerical values. – kilojoules Dec 23 '15 at 19:29
  • didn't know this trick, nice. there's a little tidbit in the xgboost doc under the function setinfo(), though its not very descriptive – TBSRounder Dec 24 '15 at 15:39
23

On Python you have a nice scikit-learn wrapper, so you can write just like this:

import xgboost as xgb
exgb_classifier = xgb.XGBClassifier()
exgb_classifier.fit(X, y, sample_weight=sample_weights_data)

More information you can receive from this: http://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBClassifier.fit

lucidyan
  • 331
  • 2
  • 5
15

You could try building multiple xgboost models, with some of them being limited to more recent data, then weighting those results together. Another idea would be to make a customized evaluation metric that penalizes recent points more heavily which would give them more importance.

TBSRounder
  • 883
  • 6
  • 12