Is there a way to add more importance to points which are more recent when analyzing data with xgboost?
Asked
Active
Viewed 4.6k times
3 Answers
43
Just add weights based on your time labels to your xgb.DMatrix. The following example is written in R but the same principle applies to xgboost on Python or Julia.
data <- data.frame(feature = rep(5, 5),
year = seq(2011, 2015),
target = c(1, 0, 1, 0, 0))
weightsData <- 1 + (data$year - max(data$year)) * 5 * 0.01
#Now create the xgboost matrix with your data and weights
xgbMatrix <- xgb.DMatrix(as.matrix(data$feature),
label = data$target,
weight = weightsData)

wacax
- 3,390
- 4
- 23
- 45
23
On Python you have a nice scikit-learn wrapper, so you can write just like this:
import xgboost as xgb
exgb_classifier = xgb.XGBClassifier()
exgb_classifier.fit(X, y, sample_weight=sample_weights_data)
More information you can receive from this: http://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBClassifier.fit

lucidyan
- 331
- 2
- 5
-
-
1that should be
xgb.XGBClassifier()
in the second line of code but stackexchange does not allow edits of less than six characters... – Andre Holzner Jul 18 '17 at 10:05
15
You could try building multiple xgboost models, with some of them being limited to more recent data, then weighting those results together. Another idea would be to make a customized evaluation metric that penalizes recent points more heavily which would give them more importance.

TBSRounder
- 883
- 6
- 12
-
9The OP can simply give higher sample weights to more recent observations. Most packages allow this, as does xgboost. – Ricardo Cruz Aug 11 '17 at 08:55
setinfo()
, though its not very descriptive – TBSRounder Dec 24 '15 at 15:39