Haskell foldl’ poor performance with (++)

There is much confusion about this issue. The usual reason given is that “repeatedly appending at end of list requires repeated traversals of list and is thus O(n^2)“. But it would only be so simple under strict evaluation. Under lazy evaluation everything is supposed to be delayed, so it begs the question whether there actually are these repeated traversals and appendings at all. The adding at end is triggered by consuming at front, and since we consume at front the list is getting shorter, so what is the exact timing of these actions is unclear. So the real answer is more subtle, and deals with specific reduction steps under lazy evaluation.

The immediate culprit is that foldl' only forces its accumulator argument to weak head normal form – i.e. until a non-strict constructor is exposed. The functions involved here are

(a:b)++c = a:(b++c)    -- does nothing with 'b', only pulls 'a' up
[]++c = c              -- so '++' only forces 1st elt from its left arg

foldl' f z [] = z
foldl' f z (x:xs) = let w=f z x in w `seq` foldl' f w xs

sum xs = sum_ xs 0     -- forces elts fom its arg one by one
sum_ [] a = a
sum_ (x:xs) a = sum_ xs (a+x)

and so actual reduction sequence is (with g = foldl' f)

sum $ foldl' (\acc x-> acc++[x^2]) []          [a,b,c,d,e]
sum $ g  []                                    [a,b,c,d,e]
      g  [a^2]                                   [b,c,d,e]
      g  (a^2:([]++[b^2]))                         [c,d,e]
      g  (a^2:(([]++[b^2])++[c^2]))                  [d,e]
      g  (a^2:((([]++[b^2])++[c^2])++[d^2]))           [e]
      g  (a^2:(((([]++[b^2])++[c^2])++[d^2])++[e^2]))   []
sum $ (a^2:(((([]++[b^2])++[c^2])++[d^2])++[e^2]))

Notice we’ve only performed O(n) steps so far. a^2 is immediately available here for the sum‘s consumption, but b^2 is not. We’re left here with the left-nested structure of ++ expressions. The rest is best explained in this answer by Daniel Fischer. The gist of it is that to get b^2 out, O(n-1) steps will have to be performed – and the structure left in the wake of this access will still be left-nested, so the next access will take O(n-2) steps, and so on – the classic O(n^2) behavior. So the real reason is that ++ isn’t forcing or rearranging its arguments enough to be efficient.

This is actually counter-intuitive. We could expect the lazy evaluation to magically “do it” for us here. After all we’re only expressing out intent to add [x^2] to the end of list in the future, we don’t actually do it right away. So the timing here is off, but it could be made right – as we access the list, new elements would be added into it and consumed right away, if the timing were right: if c^2 would be added into the list after b^2 (space-wise), say, just before (in time) b^2 would be consumed, the traversal/access would always be O(1).

This is achieved with so-called “difference-list” technique:

newlist_dl lst = foldl' (\z x-> (z . (x^2 :)) ) id lst

which, if you think of it for a moment, looks exactly the same as your ++[x^2] version. It expresses the same intent, and leaves left-nested structure too.

The difference, as explained in that same answer by Daniel Fischer, is that a (.) chain, when first forced, rearranges itself into a right-nested ($) structure1 in O(n) steps, after which each access is O(1) and the timing of appendings is optimal exactly as described in the above paragraph, so we’re left with overall O(n) behaviour.


1 which is kind of magical, but it does happen. 🙂

Leave a Comment