3

I have a high interest in priority-queues (E.g., see my answers on: Does there exist a priority queue with $O(1)$ extracts?), and was wondering if there is a priority-queue or similar data-structure where you can sort by multiple values?

For example, if I wanted to sort by numval and sort by strval, and be able to get the highest (Gödel numbering for str) in $\mathcal{O}(1)$.

struct Node {
    int numval;
    std::string strval;
};

Easily I can think to just maintain two priority-queues, but this would require twice the memory.

Is there a better way?

A T
  • 968
  • 9
  • 21
  • 1
    Wait, maybe I'm missing something; how would defining an ordering based on these two fields not solve your problem? i.e., compare first by numval, then lexicographically by strval, or vice-versa? Orders need not be based only on one value. Sorry if this is missing the point entirely. – Patrick87 Jul 05 '12 at 16:14
  • Just trying to get $\mathcal{O}(1)$ extract for highest numval AND $\mathcal{O}(1)$ extract for highest strval. (With $\mathcal{O}(log\ n)$ to populate with next top) – A T Jul 05 '12 at 17:02
  • I think you mean your first order is numnval and after that with same numvals you looking for highest strval, am I right? –  Jul 05 '12 at 17:08
  • Nope, think of them as separate (but physically part of same record) – A T Jul 05 '12 at 17:14
  • 6
    Each queue can work with object references rather than the data itself. So while you have double the memory for the priority queue, the memory for the objects is shared. – edA-qa mort-ora-y Jul 05 '12 at 17:28
  • Judging from your pseudocode, you might know C++ and the Boost libraries. There's a Multi-Index container in Boost, which supports what you describe. As far as I know the implementation, it does some small space optimizations, but it still needs $O(n)$ space. However, you might find something interesting from the documententation. – Juho Jul 05 '12 at 17:49
  • Can you explain why you think it will take twice more memory? Twice more memory than what? – Artium Jul 05 '12 at 20:20
  • Twice the amount of memory that would be required to sort just by one element. Still linear though, but e.g.: for 2 elements it will take memory $2n = \mathcal{O}(n)$. – A T Jul 06 '12 at 16:32

1 Answers1

3

Simple solution: use two queues
If you want to keep track of multiple priorities that are unrelated then you'll have to use 2 priority queues.
You don't have to duplicate all the data, because you can just put a reference (pointer) to the data in your queue.
That way you only have 1 location where your object resides and two pointers to it in the two priority-queues.

So the memory load would be $O(nk+2n)$ where $k$ is the length of your object and $n$ is the number of objects.
As long as $k$ is significantly long the $2n$ factor will be insignificant.

Complex solution: intertwine the queues and use COW semantics
If the priorities are related, then you can lessen the memory load by implementing the priority queue as a linked structure where shared items link to the same node.
You'll have to use copy-on-write semantics on the shared nodes for this to work.

For an example see B(+)Trees in databases
Something very similar happens in databases, where the tables are represented as B-trees.
When a change is made a reference to the tree is copied and copy-on-write is applied to all nodes that are changed.
When the change is committed the pointers for the changed tree and the original tree are exchanged.
While the transaction is in progress the old tree and the new tree are intertwined.

Beware the running time of COW
You can do the same to your priority queue, but this only makes sense if most of the data will be the same for the queues or writes are rare, otherwise the copy-on-write semantics will kill your running time.

Johan
  • 1,070
  • 9
  • 27