24

I have noticed that most functional languages employ a singly-linked list (a "cons" list) as their most fundamental list types. Examples include Common Lisp, Haskell and F#. This is different to mainstream languages, where the native list types are arrays.

Why is that?

For Common Lisp (being dynamically typed) I get the idea that the cons is general enough to also be the base of lists, trees, etc. This might be a tiny reason.

For statically typed languages, though, I can't find a good reasoning, I can even find counter-arguments:

  • Functional style encourages immutability, so the linked list's ease of insertion is less of an advantage,
  • Functional style encourages immutability, so also data sharing; an array is easier to share "partially" than a linked list,
  • You could do pattern matching on a regular array just as well, and even better (you could easily fold from right to left for example),
  • On top of that you get random access for free,
  • And (a practical advantage) if the language is statically typed, you can employ a regular memory layout and get a speed boost from the cache.

So why prefer linked lists?

coredump
  • 5,945
Kos
  • 1,434
  • 1
  • 14
  • 23
  • 5
    Coming from the comments on @sepp2k's answer, I think an array is easier to share "partially" than a linked list needs clarification as to what you mean. Due to their recursive nature, the opposite is true as I understand it - you can partially share a linked list easier by passing along any node in it, while an array would need to spend time making a new copy. Or in terms of data sharing, two linked lists can point to the same suffix, which just plain isn't possible with arrays. – Izkata Jan 29 '12 at 15:19
  • If an array defines itself as an offset,length,buffer triple, then you could share an array by making a new one with offset+1,length-1,buffer. Or have a special type of array as the subarray. – Dobes Vandermeer Oct 17 '12 at 07:01
  • @Izkata When talking about arrays, we rarely just mean a buffer such as a pointer to the start of contiguous memory in C. We usually mean some sort of structure which stores the length and a pointer to the start of a wrapped buffer. Under such a system, a slicing operation can return a subarray, whose buffer pointer points mid-way into the buffer (at the first element of the sub array), and whose count is such that the start+count gives you the last element. Such slicing operations are O(1) in time and space – Alexander Sep 04 '17 at 16:50
  • For the topic of functional data structures in general, you may be interested in Phil Bagwell's data structures. These are the basis for the vector,set,map used in Clojure, Scala, Haskell, etc: https://stackoverflow.com/questions/16270598/what-is-the-data-structure-behind-clojures-sets.

    Chris Okasaki also wrote a book called Purely Functional Data Structures, but this predates Bagwell's tries.

    – Riley Jun 21 '20 at 16:05
  • Here is a gentle introduction to the inner workings of Clojure's vector: https://hypirion.com/musings/understanding-persistent-vector-pt-1 – Riley Jun 21 '20 at 16:08

5 Answers5

23

The most important factor is that you can prepend to an immutable singly linked list in O(1) time, which allows you to recursively build up n-element lists in O(n) time like this:

// Build a list containing the numbers 1 to n:
foo(0) = []
foo(n) = cons(n, foo(n-1))

If you did this using immutable arrays, the runtime would be quadratic because each cons operation would need to copy the whole array, leading to a quadratic running time.

Functional style encourages immutability, so also data sharing; an array is easier to share "partially" than a linked list

I assume by "partially" sharing you mean that you can take a subarray from an array in O(1) time, whereas with linked lists you can only take the tail in O(1) time and everything else needs O(n). That is true.

However taking the tail is enough in many cases. And you have to take into account that being able to cheaply create subarrays doesn't help you if you have no way of cheaply creating arrays. And (without clever compiler optimizations) there is no way to cheaply build-up an array step-by-step.

sepp2k
  • 4,339
  • That's not true at all. You can append to arrays in amortized O(1). – DeadMG Jan 29 '12 at 02:28
  • 10
    @DeadMG Yes, but not to immutable arrays. – sepp2k Jan 29 '12 at 02:30
  • "partially sharing" - I think both that two cons lists can point to the same suffix list (dunno why you'd want this), and that you can pass a midpoint instead of the beginning of the list to another function without having to copy it (I've done this many times) – Izkata Jan 29 '12 at 05:00
  • @Izkata The OP was talking about partially sharing arrays though, not lists. Also I've never heard what you're describing referred to as partial sharing. That's just sharing. – sepp2k Jan 29 '12 at 05:19
  • @sepp2k The OP is (incorrectly) using Array and List interchangeably, calling a "cons list" a "functional style" array. See his second bullet point, the specific part I was referring to. And it's partial sharing because, for example, the function doesn't even know the rest of the list exists - it only sees a part of the list. – Izkata Jan 29 '12 at 07:11
  • 1
    @Izkata The OP uses the term "array" exactly three times. Once to say that FP languages use linked lists where other languages use arrays. Once to say that arrays are better at partial sharing than linked lists and once to say that arrays can be pattern matched just as well (as linked lists). In all cases he's contrasting arrays and linked lists (to make the point that arrays would be more useful as a primary data structure than linked lists, leading to his question why linked lists are preferred in FP), so I don't see how he could be using the terms interchangeably. – sepp2k Jan 29 '12 at 07:30
  • @sepp2k thanks for the clarification, that's what I meant. – Kos Jan 29 '12 at 12:54
  • I believe @DeadMG has a point; in many cases we know upfront how many elements we're going to build up during a recursive call, and even if not, then amortized time is still possible I think... Please explain, why wouldn't it be? – Kos Jan 29 '12 at 12:56
  • @Kos How would you build up an array of n elements recursively without mutating it in O(n) time? Even if you know n before hand? Any operation that took an array arr of n-1 elements and an element x and would return an array of n elements, containing the elements of arr as well as x without changing the value of arr would have to take O(n) time. If you leave out the "without changing the value of arr" part, it's possible in (ammortized) O(1) time of course, but then we're no longer talking about immutable data structures. – sepp2k Jan 29 '12 at 19:22
  • If you know the number of elements, you could also create an n-element array recursively by saying something like f(n) = g(n, createNewArray(n)) and then g(0, arr) = arr; g(n, arr) = { arr2 = modifyArray(arr, n-1, n); g(n-1, arr2) }, where modifyArray(arr, i, x) returns a new array where the ith element of arr is set to x. However that will still have quadratic runtime because modifyArray would be an O(n) operation. – sepp2k Jan 29 '12 at 19:28
  • Of course a smart compiler could replace the immutable versions of append or modifyArray with mutable versions if it detects that the original array won't be used anymore after the call to append or modifyArray (which is what I meant before with "clever compiler optimizations"). That would many useful cases of building up an array O(n). However that's a non-trivial optimization that makes the language more complex to implement and isn't necessary with linked lists. It also makes it much harder to reason about the performance of programs. – sepp2k Jan 29 '12 at 19:32
  • I don't really see the merit of this advantage. Sure prepending is O(1), but appending is O(n). You win some, you lose some. And in many cases, adding to the end is preferable, which can force you into having to write ugly workarounds that use an accumulator http://people.sju.edu/~jhodgson/ai/accums.html – Alexander Sep 04 '17 at 16:47
  • @Alexander Using an immutable array, both would be O(n), so it's not a win and a loss - it's a win and a tie. The advantage is that it's possible to build up an n-element list recursively in O(n) time. Sure it may require using reverse or an accumulator in some cases, but that's still better than not being able to do it all. – sepp2k Sep 04 '17 at 17:25
  • @sepp2k You can have an immutable data structure with internal mutability. There's no rule that says that the internal workings of a ds must be immutable. In this case, you can have a typical ArrayList type ds, with amortized O(1) appends, whose inner mutability is contained and not observable from the outside world (apart from memory debuggers, I suppoes). – Alexander Sep 04 '17 at 17:56
  • @Alexander You're talking about copy-on-write, right? Otherwise the mutability would be externally visible. Copy-on-write works, but it makes it harder to reason about when appending is O(1) and when it is O(n). – sepp2k Sep 04 '17 at 18:02
  • @sepp2k No, I don't think I'm referring to CoW. The copy doesn't occur on right, but only when the append occurs on an already full buffer. If the new size is some multiple of the old size (e.g. 1.5x, 2x), then this copying is amortized to O(1). – Alexander Sep 04 '17 at 18:06
  • @sepp2k When one variable stores a reference to a list, it's actually a ArrayList structure storing a base pointer to the memory buffer and a count. When an append is done, a new ArrayList structure is created, whose base pointer is either shared (in the case that there's capacity for more element in the existing buffer), or a new buffer (in the case that reallocation was necessary), with an incremented count. The original list remains uneffectedi in either case – Alexander Sep 04 '17 at 18:08
  • @Alexander Without CoW, I don't see how something like arr1 = original_array + 23; arr2 = original_array + 42; could work correctly. If both arr1 and arr2 try to reuse the memory from original_array, they'll just overwrite each other. – sepp2k Sep 04 '17 at 18:09
4

I think it comes down to lists being rather easily implemented in functional code.

Scheme:

(define (cons x y)(lambda (m) (m x y)))

Haskell:

data  [a]  =  [] | a : [a]

Arrays are harder and not nearly as pretty to implement. If you want them to be extremely fast then they'll have to be written low-level.

Additionally, recursion works much better on lists than arrays. Consider the number of times you've recursively consumed/generated a list vs indexed an array.

Pubby
  • 3,380
  • I wouldn't say it's accurate to call your scheme version an implementation of linked lists. You won't be able to use it to store anything but functions. Also arrays are harder (impossible actually) to implement in any language that doesn't have built-in support for them (or memory chunks), while linked lists only require something like structs, classes, records or algebraic data types to implement. That's not specific to functional programming languages. – sepp2k Jan 29 '12 at 02:29
  • @sepp2k What do you mean "store anything but functions"? – Pubby Jan 29 '12 at 02:54
  • 1
    What I meant was that lists defined that way can't store anything that's not a function. That's not actually true though. Dunno, why I've thought that. Sorry about that. – sepp2k Jan 29 '12 at 03:00
3

A singly-linked list is the simplest persistent data structure.

Persistent data structures are essential for performant, purely functional programming.

ziggystar
  • 139
  • 1
    this seems to merely repeat point made and explained in the top answer that was posted over 4 years ago – gnat Jul 05 '16 at 13:41
  • 3
    @gnat: The top answer doesn't mention persistent data structures, or that singly linked lists are the simplest persistent data structure, or that they are essential for performant purely functional programming. I can find no overlap with the top answer at all. – Michael Shaw Jul 05 '16 at 18:16
2

You can use Cons nodes easily only if you have a garbage collected language.

Cons Nodes match a lot with the functional programming style of recursive calls and immutable values. So it fits well into the mental programmer model.

And don't forget historical reasons. Why are they still called Cons Nodes and worse still use car and cdr as accessors? People learn it from text books and courses and then use it.

You are right, in the real world arrays are much easier to use, consume only half the memory space and are much performanter because of cache level misses. There is no reason to use them with imperative languages.

Lothar
  • 702
  • 5
  • 6
1

Linked lists are important for the following reason:

Once you take a number like 3, and convert it to successor sequence like succ(succ(succ(zero))), and then use substitution to it with {succ=List node with some memory space}, and {zero = end of list}, you'll end up with a linked list (of length 3).

The actual important part is numbers, substitution, and memory space and zero.

Dynamic
  • 5,746
tp1
  • 1,902
  • 11
  • 10