Chp 7: Dictionaries and Sets

What's a Dictionary?

In the next chapter, we'll turn again to logic. But for now we continue on with data structures, i.e. with ways that we can store away and then retrieve data. We've spent time with strings, files, lists and tuples. We now turn to dictionaries and sets. Dictionaries first.

Like strings, files, lists and tuples, dictionaries are collections. But unlike these other collections types, dictionaries are unordered; we don't have a first, and then a second, etc. We have many but those many have no order. (Of course when we create them, we have to give their elements in some order. But if we were to give those same elements in a different order, it'd be the same dictionary.)

Dictionaries also differ from strings, files, lists and tuples in a second way. Every entry in a dictionary (and we'll call its elements "entries") is really two objects. We call these key and value. Keys are like indices. We use them to extract their associated values, just as with lists we use indices to extract their associated elements. But since dictionary keys can be (most) anything at all, dictionaries are a generalization of lists.

But that's enough abstract talk about dictionaries. Let's have concrete examples.

How to Make a Dictionary

We begin and end a dictionary with a curly brace. We associate a key and a value with the colon. Key-value pairs are separated by commas. Like this:

dct = {'lol': 'laugh out loud', 'brb': 'be right back', 'gg': 'good game'}

The keys here are 'lol', 'brb', 'gg'. The associated values are 'laugh out loud', 'be right back' and 'good game' respectively.

Note that key-value pairs in a dictionary are unordered. Of course if we print a dictionary, those pairs will be printed in a particular orders. But that order is irrelevant.

>>> a_dct = {1: 'a', 2: 'b', 3: 'c'}

>>> another_dct = {2: 'b', 1: 'a', 3: 'c'}

>>> a_dct == another_dct

True

Note also that every key has a unique associated value; that is, we cannot have a key associated with multiple values. If you try to give a key multiple values, only the last key-value pair is retained.

>>> a_dct = {1:'a', 1:'b'}

>>> a_dct

{1:'b'}

Can we have a value associated with multiple keys? Let's check:

>>> a_dct = {1: 'a', 2: 'a'}

>>> a_dct

{1: 'a', 2: 'a'}

Yep! We can!

Extract a Value

Of course dictionaries are of no use if we cannot extract the values that we've stored away in them. How do we extract? By key! The syntax is the same as what we use to extract elements of a list by index.

>>> dct = {'lol': 'laugh out loud', 'brb': 'be right back', 'gg': 'good game'}

>>> dct['lol']

'laugh out loud'

>>> dct['gg']

'good game'

As I said, dictionaries are a generalization of lists. With lists, we have to use integers to extract - a_lst[0], a_lst[1], etc.   But with dictionaries we create our own keys. (A restriction does exist here. We can't make a mutable data object a key. So we can't make a list a key. Or a dictionary. I won't explain why this is. Google if you're curious.)

Dictionaries are Functions

No, I don't mean in the Pythonic sense. I mean in the mathematical. Mathematical functions always give a unique output for any input (at least, that is, if they give any output at all). If we think of key as input and value as output (and that's a fine way to think of them), then dictionaries are mathematical  functions. For every key, there is one and only one value.

Add a Key-Value Pair, Delete a Key-Value Pair

Dictionaries, like lists, are mutable. So let's learn how to mutate then.

To add a key-value pair to a dictionary, we use the a_dct[new_key] = value syntax.

>>> dct = {'lol': 'laugh out loud', 'brb': 'be right back', 'gg': 'good game'}

>>> dct['afk'] = 'away from keyboard' 

>>> dct

{'gg': 'good game', 'brb': 'be right back', 'lol': 'laugh out loud', 'afk': 'away from keyboard'}

To delete a key-value pair from a dictionary, we use the del a_dct[a_key].

>>> dct = {'lol': 'laugh out loud', 'brb': 'be right back', 'gg': 'good game'}

>>> del dct['brb']

>>> dct

{'gg': 'good game', 'lol': 'laugh out loud'}

Addition of a key already present in the dictionary replaces the old key-value pair.

>>> dct = {'lol': 'laugh out loud', 'brb': 'be right back', 'gg': 'good game'}

>>> dct['gg'] = 'ghastly game'

>>> dct

{'gg': 'ghastly game', 'brb': 'be right back', 'lol': 'laugh out loud'}

Subtraction is dangerous. If the specified key isn't present, del will throw an error.

>>> dct = {'lol': 'laugh out loud', 'brb': 'be right back', 'gg': 'good game'}

>>> del dct['gr']

Traceback (most recent call last):

  File "<pyshell#16>", line 1, in <module>

    del dct['gr']

KeyError: 'gr'

len and in

The length of a dictionary is the number of key-value pairs that it contains. The len function returns the length of a dictionary.

>>> dct = {'lol': 'laugh out loud', 'brb': 'be right back', 'gg': 'good game'}

>>> len(dct)

3

We can test whether a dictionary contains a certain key. We do so with the in function.

>>> dct = {'lol': 'laugh out loud', 'brb': 'be right back', 'gg': 'good game'}

>>> 'lol' in dct

True

Note that in does not tell us whether a value is present in a dictionary.

>>> dct = {'lol': 'laugh out loud', 'brb': 'be right back', 'gg': 'good game'}

>>> 'good game' in dct

False

An Example

Let's have a few examples of the uses of dictionaries. First we'll use a dictionary to translate a sequence of digits into a sequence of names of those digits. For instance '501' should become 'five zero one'.  (I'll assume that the input string is indeed a sequence of digits.)

def digit_trans(digit_str):

    digit_names = {'0':'zero', '1':'one', '2':'two', '3':'three', '4':'four', '5': 'five',

                   '6':'six', '7':'seven', '8':'eight', '9':'nine'}

    name_str = ''

    for digit in digit_str:

        name_str = name_str + digit_names[digit] + ' '

    return name_str.rstrip()

The rstrip was necessary at the end since space was always added after a digit name.

A few tests:

>>> digit_trans('501')

'five zero one'

>>> digit_trans('1033217')

'one zero three three two one seven'

A Second Example

Let's say we wish to count the number of each vowel (ignore case) in a string - the numbers of a's, the number of e's, etc. A dictionary is the natural data type for the task; the keys will be the vowels and the values will be the counts.  So let's write a function that takes a string and returns a dictionary and gives the number of occurrences of each vowel in the string. We'll call it vowel_count.

Before we begin the count, we should initialize the dictionary that stores the counts. The keys should be the vowels; initially the values should all be 0.

def vowel_count(a_str):

    a_str = a_str.lower()

    vowels= 'aeiou'

    count_dct = {}  # The empty dictionary

    for vowel in vowels:

        count_dct[vowel] = 0  # Add an entry to count_dct

The initial value of count_dict will be {'a': 0, 'e': 0, 'i': 0, 'o': 0, 'u': 0}.

Next we'll iterate through the string and count the vowels.

    for char in a_str:

        if char in vowels:

            count_dct[char] += 1

    return count_dct

 A few trial runs:

>>> vowel_count('spam and eggs')

{'a': 2, 'e': 1, 'i': 0, 'o': 0, 'u': 0}

>>> vowel_count('Your mother was a hamster and your father smelt of elderberries.')

{'a': 5, 'e': 8, 'i': 1, 'o': 4, 'u': 2}

Dictionary Iteration

We can most certainly iterate through a dictionary. Don't expect to iterate through in some particular order, for remember that dictionaries are unordered.

How do we iterate through a dictionary? By key. Like this:

dct = {1: 'a', 2: 'b', 3: 'c'}

for key in dct:

    print(key, dct[key])

The output:

1 a

2 b

3 c

This gives us a way to determine whether a  dictionary contains a given value. How? Iterate through the keys, and ask if any of their values is that given value.

def has_value(D, v):

    # Return True if dictionary D contains value d,

    # False otherwise.

    for key in D:

        if D[key] == v:

            return True

    return False

Dictionaries and Selection

Think back to  the letter_grade function I had you write in the Selection chapter. Here were the instructions:

Write a function named letter_grade which takes a numerical grade and returns the associated letter grade. Use the usual ranges - an 'A' for any grade from 90 to 100 inclusive, a 'B' for any grade greater than or equal to 80 and less than 90, etc. Return 'NaG' for any value outside the range 0 - 100.

Don't just stack if's. Use the full if-elif-else structure.

The function you wrote no doubt had a longish selection block. An if followed by an elif, followed by an elif, etc. That can be compressed down by use of a dictionary. I suggest we make the keys two-value tuples that represent the upper and lower values in a grade range - (90, 100) for 'A', (80, 90) for 'B', etc. The values for these keys will be the associated letter grade.  We then iterate through the keys, find the one in which the input numerical grades lies, and output the associated letter grade. Here we go:

def letter_grade(n):

    # return letter grade associated with numerical grade n

    grade_dict = {(90,100):'A', (80,90):'B', (70,80):'C', (60,70):'D', (0,60):'F'}

    for grade_range in grade_dict:

        if n >= grade_range[0] and n <= grade_range[1]:

            return grade_dict[grade_range]

    return 'NaG'

As always, when we iterate through a dictionary, the loop variable - grade_range in this case - takes as its values the keys from the dictionary. These are tuples, and so we search until we find the tuple for the grade range in which the input grade lies; and when we find it, we return the value associated with the key, which is the letter grade.

The code is compact. The code is readable. The code is easily updated. Hurray for dictionaries!

Dictionaries are Mutable

I'll end with a word of caution. Like lists, dictionaries are mutable. We can add a key-value pair. We can subtract a key-value pair. We can associate a key already present with a new value.

This is dangerous. Consider this evil function:

def evil_func(a_dict):

    a_dict['spam'] = 'eggs'

Watch the evil in action:

>>> b_dict = {'unladen':'sparrow'}

>>> evil_func(b_dict)

>>> b_dict

{'unladen':'sparrow', 'spam':'eggs'}

Imagine the plight of a programmer who didn't recall (or never knew) that the local variable a_dict was an alias of the dictionary named by b_dict and so that changes made with a_dict would carry over to b_dict. (I don't have to imagine. I've been that programmer.)

Now, you might be tempted to draw the conclusion that you shouldn't alias dictionaries. But that's too extreme. We inevitably alias a dictionary when we send it to a function. But you should be aware of the danger when you alias a dictionary. A risk realized is a risk avoided.

Sets

Sets are unordered collections of unique elements. They're like dictionaries in one way but unlike in another. They're like in that both are unordered. They're unlike in that dictionaries but not sets are composed of key-value pairs.

We create  non-empty sets with curly braces. Here's one:

>>> a_set = {1, 2, 3}

>>> type(a_set)

<class 'set'>

As I said, sets are unordered, and sets cannot contain an element twice.

>>> a_set = {1, 2, 3, 3}

>>> a_set

{1, 2, 3}

>>> b_set = {2, 3, 1}

>>> a_set == b_set

True

We tried to put a 3 in twice. We failed. We asked if a set that contains 1, 2 and 3 is the same as a set that contains 2, 3  and 1. They are.

Note that we cannot create an empty set with curly braces. Watch what happens if we try:

>>> empty_set = {}

>>> type(empty_set)

<class 'dict'>

Failure! We made a dictionary.  Here's how we make an empty set:

>>> empty_set = set()

>>> type(empty_set)

<class 'set'>

Traverse a Set

We iterate through set with for

>>> primes = {11, 2, 7, 5, 3}

>>> for prime in primes:

...  print(prime)

...

2

3

5

7

11

Note that the order of printed primes isn't the order given in primes = {11, 2, 7, 5, 3}.  This should come as no surprise. Sets are unordered collections. You should not expect that for will  traverse a set in some particular order.

add, remove, len, in

add, remove, len and in work in the expected way.

>>> primes = {2, 3, 5, 7, 11}

>>> len(primes)

5

>>> len(set())

0

>>> 2 in primes

True

>>> 13 in primes

False

>>> 2 not in primes

False

>>> 13 not in primes

True

>>> primes.add(13)

>>> primes

{2, 3, 5, 7, 11, 13}

>>> primes.remove(3)

>>> primes

{2, 5, 7, 11, 13}

Both add and remove are mutators that return None.