I don't know if this is a common/best practice, but it's another point of view of the matter.
If you have, let's say, a date, you can treat each field as a "category variable" instead a "continuous variable". The day would have a value in the set {1, 2... ,31}, the month would have a value in {1,...,12} and, for the year, you choose a minimum and a maximum value and build a set.
Then, as the specific numeric values of days, months and years might not be useful for finding trends in the data, use a binary representation to encode the numeric values, being each bit a feature. For example, month 5 would be 0 0 0 0 1 0 0 0 0 0 0 0
(11 0's an a 1 in 5th position, each bit being a feature).
So, having, for example, 10 years in the "year's set", a date would be transformed into a vector of 43 features (= 31 + 12 + 10). Using "sparse vectors", the amount of features shouldn't be a problem.
Something similar could be done for time data, day of the week, day of the month...
It all depends of the question you want your machine learning model to answer.
is_easter
,is_superbowl
from any package? – Tajni May 26 '20 at 04:54holidays
package for Easter and a lot of other things, but I'm not aware of anything out of the box foris_superbowl
, but the same package allows you to specify custom holidays. "For more complex logic like 4th Monday of January [or, in the case of the Super Bowl, first Sunday in February], you can inherit the HolidayBase class and define your own _populate(year) method. See [the] documentation for examples." Source: https://github.com/dr-prodigy/python-holidays/blame/abc1b31b112bf787d6cd906f76691db406d3fbee/README.rst#L73-L75 – deepyaman Nov 29 '20 at 04:34