I don't think you can use Pearson's correlation because it is used for continuous variables. Your variables are ordinal, so a test like Spearman's would be more appropriate. However I don't think that tests for ordinal variables would be appropriate either, because your variables are also cyclical, in the sense that Hour_ofDay=23
and Hour_ofDay=1
are really just 2 hours apart, but for the Spearman's test they would be considered 22 hours apart.
I think in this case it would be more appropriate to look at the distribution of the distance (measured in hours) between the two variables.
The appropriate distance metric in this case is defined as follows (distance originally defined in the accepted answer to this other question)
import numpy as np
distance = np.sign(a1-a2)*(12 - abs(abs(a1 - a2) - 12))
Where a1 and a2 are your App and Email opening time variables. Note that the variables need to be in the range 0 to 23 for this distance to work.
Compute this distance for each row, add it as a column to your dataframe and plot it with an histogram.
This histogram will tell you quite a lot about the "correlation" between the two variables.
For example
- No correlation: the histogram will be uniform between -12 and 12
- Instantaneous correlation, i.e. user opens email and app at the same time: the histogram will have a peak at 0
- Anticipated delay, i.e. user opens email 1 hour before app: the histogram will have a peak at 1
- Delayed correlation, i.e. user opens email 1 hour after app: the histogram will have a peak at -1
This visualization will allow you to draw rich conclusions about the relation between App- and Email opening times.
Note: if your variables also include minutes and seconds in a date format you have to convert the variables to numerical. E.g. 01:30 (hour 1 and 30 minutes) becomes 1.5 . Also pay attention to the date format in case you have time expressed in the 12-hour clock (e.g. 6 PM, 1 AM)