More efficient algorithm for determining if one list is a sublist of another list

Question

I'm trying to build an algorithm which takes two lists of natural numbers and finds if every element of the first list is displayed at least once in the second list.

What if the list is sorted?

An algorithm that can do this is by comparing every element of the first list with every element from the second list. I think there is an algorithm with a better complexity. Can anyone give me any idea?

Why presume that more complexity is better? In all cases one that works is better than one that does not. Every other choice is a tradeoff. — , Nov 25 '13 at 02:34

Gilles 'SO- stop being evil' · Accepted Answer · 2013-11-26T21:23:38.760

Let's start with the case when the lists are sorted. In that case, you can apply a simple modification of the basic merge algorithm on sorted lists: discard the elements instead of constructing a merged list, and only keep track of whether an element from list 1 was missing from list 2.

In the pseudo-code below, head(list) is the first element of list, and advance(list) means to discard the head of list. In other words, head(cons(h,t)) = h and advance(list) when list = cons(h,t) means list := t.

while list1 is not empty:
    if list2 is empty or head(list1) < head(list2):
        return false
    else if head(list1) = head(list2):
        let x = head(list1)
        while list1 is not empty and head(list1) == x: advance(list1)
        while list2 is not empty and head(list2) == x: advance(list2)
    else: (head(list1) > head(list2))
        advance(list2)
return true

Exercise: prove that this algorithm returns true when all the elements of list1 occur in list2 and false otherwise.

Let $n_1 = \mathrm{length}(\mathtt{list1})$, $n_2 = \mathrm{length}(\mathtt{list2})$ and $n = \max(n_1, n_2)$. The algorithm above removes at least one element of list2 at each iteration, and in the worst case it removes all the elements of both list. Therefore it executes at most $n_2$ iterations of the loop and performs at most $n_1 + n_2$ removals of the head of the list: its complexity is $O(n)$.

Now suppose that the lists are not sorted. An obvious solution is to sort them, then apply the known algorithm. Since sorting is $O(n \log n)$, this algorithm is $O(n \log n)$.

Exercise: make sure you understand why my statements about complexity are true.

Exercise (difficult!): can you do better than $O(n \log n)$ in the general case?

One can do better than $O(n\log n)$ in expectation using hashing. — Louis, Nov 26 '13 at 09:24
@Louis Only with some dubious models, or with assumptions about the data set. See (When) is hash table lookup O(1)? — Gilles 'SO- stop being evil', Nov 26 '13 at 09:51
Shuresh's answer... which I apparently commented on already has enough to do what we need here. — Louis, Nov 26 '13 at 09:56
With "head" do you mean the first element of the list or the last element of the list or something else? — Student, Nov 26 '13 at 20:03

score 3 · Answer 2 · answered Nov 24 '13 at 15:48

If you know the lists are sorted, you don't need to iterate over them all. For example, Let's say the lists are this:

l1 = [1,2,3,4,5,6,7,8,9,10]
l2 = [1,2,3,4,6,7,8,9,10]

When you are looking for value 5 in the second list, you can stop looking once you've found the element with the value 6 since you know all of the remaining values are greater than 5.

More efficient algorithm for determining if one list is a sublist of another list

2 Answers2