Convert list into a dictionary [duplicate]

If you are still thinking what the! You would not be alone, its actually not that complicated really, let me explain.

How to turn a list into a dictionary using built-in functions only

We want to turn the following list into a dictionary using the odd entries (counting from 1) as keys mapped to their consecutive even entries.

l = ["a", "b", "c", "d", "e"]

dict()

To create a dictionary we can use the built in dict function for Mapping Types as per the manual the following methods are supported.

dict(one=1, two=2)
dict({'one': 1, 'two': 2})
dict(zip(('one', 'two'), (1, 2)))
dict([['two', 2], ['one', 1]])

The last option suggests that we supply a list of lists with 2 values or (key, value) tuples, so we want to turn our sequential list into:

l = [["a", "b"], ["c", "d"], ["e",]]

We are also introduced to the zip function, one of the built-in functions which the manual explains:

returns a list of tuples, where the i-th tuple contains the i-th element from each of the arguments

In other words if we can turn our list into two lists a, c, e and b, d then zip will do the rest.

slice notation

Slicings which we see used with Strings and also further on in the List section which mainly uses the range or short slice notation but this is what the long slice notation looks like and what we can accomplish with step:

>>> l[::2]
['a', 'c', 'e']

>>> l[1::2]
['b', 'd']

>>> zip(['a', 'c', 'e'], ['b', 'd'])
[('a', 'b'), ('c', 'd')]

>>> dict(zip(l[::2], l[1::2]))
{'a': 'b', 'c': 'd'}

Even though this is the simplest way to understand the mechanics involved there is a downside because slices are new list objects each time, as can be seen with this cloning example:

>>> a = [1, 2, 3]
>>> b = a
>>> b
[1, 2, 3]

>>> b is a
True

>>> b = a[:]
>>> b
[1, 2, 3]

>>> b is a
False

Even though b looks like a they are two separate objects now and this is why we prefer to use the grouper recipe instead.

grouper recipe

Although the grouper is explained as part of the itertools module it works perfectly fine with the basic functions too.

Some serious voodoo right? =) But actually nothing more than a bit of syntax sugar for spice, the grouper recipe is accomplished by the following expression.

*[iter(l)]*2

Which more or less translates to two arguments of the same iterator wrapped in a list, if that makes any sense. Lets break it down to help shed some light.

zip for shortest

>>> l*2
['a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e']

>>> [l]*2
[['a', 'b', 'c', 'd', 'e'], ['a', 'b', 'c', 'd', 'e']]

>>> [iter(l)]*2
[<listiterator object at 0x100486450>, <listiterator object at 0x100486450>]

>>> zip([iter(l)]*2)
[(<listiterator object at 0x1004865d0>,),(<listiterator object at 0x1004865d0>,)]

>>> zip(*[iter(l)]*2)
[('a', 'b'), ('c', 'd')]

>>> dict(zip(*[iter(l)]*2))
{'a': 'b', 'c': 'd'}

As you can see the addresses for the two iterators remain the same so we are working with the same iterator which zip then first gets a key from and then a value and a key and a value every time stepping the same iterator to accomplish what we did with the slices much more productively.

You would accomplish very much the same with the following which carries a smaller What the? factor perhaps.

>>> it = iter(l)     
>>> dict(zip(it, it))
{'a': 'b', 'c': 'd'}

What about the empty key e if you’ve noticed it has been missing from all the examples which is because zip picks the shortest of the two arguments, so what are we to do.

Well one solution might be adding an empty value to odd length lists, you may choose to use append and an if statement which would do the trick, albeit slightly boring, right?

>>> if len(l) % 2:
...     l.append("")

>>> l
['a', 'b', 'c', 'd', 'e', '']

>>> dict(zip(*[iter(l)]*2))
{'a': 'b', 'c': 'd', 'e': ''}

Now before you shrug away to go type from itertools import izip_longest you may be surprised to know it is not required, we can accomplish the same, even better IMHO, with the built in functions alone.

map for longest

I prefer to use the map() function instead of izip_longest() which not only uses shorter syntax doesn’t require an import but it can assign an actual None empty value when required, automagically.

>>> l = ["a", "b", "c", "d", "e"]
>>> l
['a', 'b', 'c', 'd', 'e']

>>> dict(map(None, *[iter(l)]*2))
{'a': 'b', 'c': 'd', 'e': None} 

Comparing performance of the two methods, as pointed out by KursedMetal, it is clear that the itertools module far outperforms the map function on large volumes, as a benchmark against 10 million records show.

$ time python -c 'dict(map(None, *[iter(range(10000000))]*2))'
real    0m3.755s
user    0m2.815s
sys     0m0.869s
$ time python -c 'from itertools import izip_longest; dict(izip_longest(*[iter(range(10000000))]*2, fillvalue=None))'
real    0m2.102s
user    0m1.451s
sys     0m0.539s

However the cost of importing the module has its toll on smaller datasets with map returning much quicker up to around 100 thousand records when they start arriving head to head.

$ time python -c 'dict(map(None, *[iter(range(100))]*2))'
real    0m0.046s
user    0m0.029s
sys     0m0.015s
$ time python -c 'from itertools import izip_longest; dict(izip_longest(*[iter(range(100))]*2, fillvalue=None))'
real    0m0.067s
user    0m0.042s
sys     0m0.021s

$ time python -c 'dict(map(None, *[iter(range(100000))]*2))'
real    0m0.074s
user    0m0.050s
sys     0m0.022s
$ time python -c 'from itertools import izip_longest; dict(izip_longest(*[iter(range(100000))]*2, fillvalue=None))'
real    0m0.075s
user    0m0.047s
sys     0m0.024s

See nothing to it! =)

nJoy!

Leave a Comment