1

The question is strictly related to What is a good way to transform Cyclic Ordinal attributes? and Ways to deal with longitude/latitude feature

They presented a very clear answer about the approach to normalise hour variable and latlong variables.

The addition of this question is about how to approach normalization having both weekday and hour features. I wonder this because they are different granular level of the same information, hence, strictly related.

The objective is to run a k-means algorithm combining the normalised weekday and hour features with other traditional numeric features.

Seymour
  • 163
  • 7

1 Answers1

1

The essential transformation in @AN6U5's answer ist done in the following lines:

df['hourfloat']=df.hour+df.minute/60.0
df['x']=np.sin(2.np.pidf.hourfloat/24.)
df['y']=np.cos(2.np.pidf.hourfloat/24.)

in the first line he transforms minutes into hours by dividing them by 60 so for example 20 Minutes are converted to 0.3333 hours

After that, in line 2 and 3 he converts this float number from polar coordinates to cartesian coordinates (https://en.wikipedia.org/wiki/Polar_coordinate_system)

So chaning this from hour to weekday you just need to adapt the first line.
Imagining a a clock where 00:00 is Monday, followed by Tuesday (clockwise), and so on ... you need to convert hour into weekday (for simplicity I assume weekday has the values 0-7). So first you divide your hour by 24 which transforms it into days and then further you divide it again by 7 which gives you a float number in weeks. Then add your weekday to the hour and then proceed with line 2 and 3 just as given, except that you correct the 24 to 7.

As a formula: hour/(24*7)+weekday = weekfloat I haven't tried it out by myself but I think this should do it.


Alternatively, when you have two cyclic features you could transform weekday and hour into spherical coordinates. This would leave you with three coordinates, x,y and z but also preserves the 'closeness' within one feature itself.

Sebastian
  • 121
  • 4
  • thank you sebastian. so you say that I should see the hour as the time since the beginning of the week. But in this case it happens that 8 am of Monday is very far from 8 am of thursday, right? – Seymour Feb 06 '18 at 09:23
  • True, they are 1 day appart, just as 06:52 and 07:52 are 1 hour appart in the other example. If you want to have the information of 8 am preserved I'd suggest you keep the column in your data. – Sebastian Feb 06 '18 at 09:31
  • thank you. what column are you suggesting to keep for preserving this information?? – Seymour Feb 06 '18 at 09:40
  • I assumed you have a column called "hour" which contains "8 am" in your previous scenario – Sebastian Feb 06 '18 at 09:41
  • clear yes I do. So maybe is better to use it as a numeric instead of converting using sincos – Seymour Feb 06 '18 at 10:00
  • I think you have to try it out both ways, maybe it has even better results if you have both ways in your data set, even though it's a bit redundant information – Sebastian Feb 06 '18 at 10:28