7

My data contains a set of start times and duration for an action. I would like to plot this so that for a given time slice I can see how many actions are active. I'm currently thinking of this as a histogram with time on the x axis and number of active actions on the y axis.

My question is, how should I adjust the data so that this is able to be plotted?

The times for an action can be between 2 seconds and a minute. And, at any given time I would estimate there could be about 100 actions taking place. Ideally a single plot would be able to show hours of data. The accuracy of the data is in milliseconds.

In the past the way that I have done this is to count for each second how many actions started , ended, or were active. This gave me a count of active actions for each second. The issue I found with this technique was that it made it difficult to adjust the time slice that I was looking at. Looking at a time slice of a minute was difficult to compute and looking at time slices of less than a second was impossible.

I'm open to any advice on how to think about this issue.

Thanks in advance!

Jeff King
  • 171
  • 4

2 Answers2

2

Since you want to show so much data, I think that your best choice is going interactive. Check out this demo, it is close to what you want but not quite.

It is very difficult to show a lot of data in a single diagram, together with the finest details and the bird-eyes view. But you can let the user interact and look for the details. To show counts, one option is to use color-coding. Take a look at this image (code here): image.

Here rgb channels have been used to encode (the logarithm of) the number of active events (red), events starting (green) and events ending (blue) for windows of different size. The X axis is time, and the Y axis represents window size, that is, duration. Thus, a point with coordinates (10, 4) represents the interval of time that goes from 10 to 14.

To make a lot of data more detailed, it could be a good idea to make the diagram zoomable (like in the demo before), and to give the user the possibility of visualizing just one channel/magnitude.

dsign
  • 261
  • 1
  • 4
1

This can be done in R using ggplot. Based on this question, it could be done with this code where limits is the date range of the plot.

tasks <- c("Task1", "Task2")
dfr <- data.frame(
name        = factor(tasks, levels = tasks),
start.date  = c("2014-08-07 09:03:25.815", "2014-08-07 09:03:25.956"),
end.date    = c("2014-08-07 09:03:28.300", "2014-08-07 09:03:30.409")
)

mdfr <- melt(dfr, measure.vars = c("start.date", "end.date"))


mdfr$time<-as.POSIXct(mdfr$value)

ggplot(mdfr, aes(time,name)) + 
geom_line(size = 6) +
xlab("") + ylab("") +
theme_bw()+
scale_x_datetime(breaks=date_breaks("2 sec"),
limits = as.POSIXct(c('2014-08-07 09:03:24','2014-08-07 09:03:29')))

enter image description here

germcd
  • 141
  • 1
  • 7
  • Unfortunately this type of plot doesn't work because it doesn't scale to show how many actions are taking place at the same time. I would estimate that at any given time there could be 100 actions taking place over the whole data set there are millions of actions. – Jeff King Jul 25 '14 at 18:32
  • you can change the thickness of the lines e.g. geom_line(size = 2) – germcd Jul 25 '14 at 18:56