As I said at the start of the last chapter, we've turned our attention from logic (functions, iteration and selection) to data. The time has now come to dig deep into the list data type. Lists are without a doubt Python's most powerful and versatile way to store multiple items of data.
A list is an ordered sequence. Of what? Of anything! Ints, floats, strings, Booleans, functions. (Yes, I said functions. You'll see examples in a later chapter.) Other lists. That can themselves contain lists. Lists truly are the general purpose sequence type.
I don't mean that one list can contain, say, Booleans and another contain ints, though that's surely true. I mean more! One list can contain Booleans and ints. And floats. And lists. And whatever your want. Other languages impose the restriction that all the elements of a list must be of the same type. Python has no such restriction.
I've had students ask me if Python has a way to automate variable creation. They needed to store away multiple items of data, and they thought they needed a sequence of variables, like perhaps sqrt1, sqrt2, sqrt3, etc. I replied that what they really needed was a list. Put all those values in a list (perhaps called sqrt_lst) and then extract elements as necessary.
Also know here at the start that lists are mutable, both in their contents and in their length. We can change what's in a list, and we can lengthen and shorten a list. Expect more on this later.
How do we create a list? One way is square brackets. Put objects between them. Separate those objects by commas.
>>> a_list = [1, 1.0, '1', True, [0, False]]
This list has five elements: the int 1, the float 1.0, the string '1', the Boolean True and the list [0, False].
A list can contain one or more objects (which we often call its elements). A list can also contain no elements. A list that has no elements is the empty list. How do we make it? Don't place anything inside the square brackets:
>>> empty_list = []
Often we create an empty list when later we wish to perform a test (Is this int a perfect square? Does this string contain only digits?) and then add the objects that pass to the list. We can't add to a list that doesn't exist!
Every list has a length, and we get that length by means of the built-in len function.
>>> L = [2, 3, 5, 7, 11]
>>> len(L)
5
>>> M = []
>>> len(M)
0
Note that the length of the empty list is 0.
Python also provides us with a handy Boolean valued function that answers the question, "Does this list contain this object". The keyword here is in. The syntax matches English; the in is placed between list and object.
>>> prf_sqrs = [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
>>> 16 in prf_sqrs
True
>>> 91 in prf_sqrs
False
>>> 16 not in prf_sqrs
False
>>> 91 not in prf_sqrs
True
Of course lists are of little use if we can't extract elements from them. We extract by index of element, just as with strings we extract by index of character; and as with strings, indices begin at 0. To extract an element of a list by its index, we place that index in square brackets after an expression whose value is a list. Like this:
>>> a_list = [1, 1.0, '1', True, [0, False]]
>>> a_list[0]
1
>>> a_list[1]
1.0
>>> a_list[2]
'1'
>>> a_list[3]
True
>>> a_list[4]
[0, False]
>>> a_list[5]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
Be careful! If no element exists at that index, you'll get the dreaded "index out of range" error.
In our study of strings, we noted a perhaps unintuitive relation between the length of a string and the index of its final element. The relation was this: the index of the final element is the length minus one; or equivalently, the length of a string is the index of its final element plus one. (Why is this? Why isn't the length equal to the index of the final element? The answer is that, in computer science, the index of the first element in a sequence is 0.)
The same is true of lists. The index of the final element is the length of the list minus one.
>>> L = [2, 3, 5, 7, 11]
>>> len(L)
5
>>> L[len(L)-1]
11
In the list named "L" immediately above, the indices of its elements are 0, 1, 2, 3 and 4. The element at index 0 is 2, the element at index 1 is 3, etc.
Note that range(len(L)) thus gives us precisely the indices of its elements, for in this case range(len(L)) is equivalent to range(5) which, as we know, is the list [0, 1, 2, 3, 4].
We'll use such ranges later when we iterate through the elements of a list.
We can also extract with negative indices. The last element of a list has index -1, the element immediately before has index -2, etc.
>>> a_list = [1, 1.0, '1', True, [0, False]]
>>> a_list[-1]
[0, False]
>>> a_list[-2]
True
Let's say we had a list of lists. Like this one:
>>> L = [[1], [2, 3], [4, 5, 6]]
This list has three elements, and each of those elements is itself a list. The first has one element, the second two and the third three.
How could we dig out the integers? Let's say in particular that we wanted to extract the 5. Consider:
>>> L[2]
[4, 5, 6]
So, we have the list that contains the 5; and the 5 is the element at index 1 in that list. So we should be able to get the 5 with another index operator after the first.
>>> L[2][1]
5
We took the element at index 1 in the element at index 2 in L.
This shouldn't come as any surprise. Our list L was two-dimensional: L has a length, and each of its elements has a length. Multi-dimensional lists required stacked index operators to dig out the elements in the inner-most lists. (I'll have much more to say about 2D lists in the Matrices project.)
When we extract by index, we get the object at that index. When we slice a list, we get a sublist. Which is a list.
How do we slice? Just as we did with strings. After an expression whose value is a list, we place square brackets; and in those brackets we place one or more integers separated by colons. The syntax here is list_name[start:stop:step] . The sublist returned begins at the element with index start. The index of each element thereafter is step above the one before. The sublist ends with the last element whose index is less than stop.
>>> prf_sqrs = [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
>>> prf_sqrs[2:9:3]
[9, 36, 81]
Start was 2, stop was 9 and step was 3. So we began at index 2, went to index 5 and stopped at index 8.
If we provide only two integers (separated of course by a colon), these are start and stop. Step defaults to 1. So below we have a start of 2, a stop of 9 and a default step of 1.
>>> prf_sqrs[2:9]
[9, 16, 25, 36, 49, 64, 81]
If we provide only the integer after the colon, that's stop, and start and step default to 0 and 1 respectively. So below we have a slice that begins at index 0 and ends at the element with index 8.
>>> prf_sqrs[:9]
[1, 4, 9, 16, 25, 36, 49, 64, 81]
Finally, if we provide only the integer before the colon, that's start. Step of course defaults to 1. But to what does stop default? The index of the last element plus one. So we slice right through to the end of the list.
>>> prf_sqrs[3:]
[16, 25, 36, 49, 64, 81, 100]
Let's consider a few useful but perhaps mysterious slices. The first is [:]. Let's puzzle out what's meant by prf_sqrs[:]. Well, since we have a colon, it's a slice. What's the start value? Since we have nothing before the colon, start defaults to 0. What's stop? Again we have no value given. So it defaults. To what? The index of the final element plus one. So prf_sqrs[:] is equivalent to prf_sqrt[0:len(prf_sqrs)]. That's just the same list again!
>>> prf_sqrs = [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
>>> prf_sqrs[:]
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
>>> prf_sqrs[0:len(prf_sqrs)]
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
(We've answered the question "What does [:] do?" But we don't know why we'd want to do that. It seems pointless. We already have the list. Why make it again? That's a very good question. As we'll find in a later section, [:] does have a purpose. A most important purpose. You'll see.)
The second mysterious slice is [::-1] - colon, colon, negative one. Step always comes after the second colon, so we have a step of -1. What are start and stop? Well, we traverse the whole list. But since step is -1, we traverse from right to left. That's the whole list but backwards! So [::-1] reverses a list.
>>> prf_sqrs[::-1]
[100, 81, 64, 49, 36, 25, 16, 9, 4, 1]
The reason we'd use [:] is far from clear. The reason we'd use [::-1] is obvious.
Lists are sequences. Sequences are iterable. Thus lists are iterable. How do we iterate through them? The for ... in ... construction of course. Let's first iterate by element.
for elem in a_lst:
print(elem)
Here we iterate once for each element of a_lst; and each element is printed to the screen on its own line.
We can also iterate by index if we choose.
for i in range(len(a_lst)):
print(a_lst[i])
Here we see the oh-so-popular choice of loop variable i. You do understand, don't you, why it's chosen? "i" is short for "index". As we saw above, range(len(a_lst)) is precisely the indices of the elements of a_lst. So once again each element of a_lst is printed to the screen.
Of the two code snippets above, the second is clearly more complex. Why iterate by index then? Answer: it gives us greater control of the iteration. If we iterate by element, the loop variable dutifully takes on each element from the list. None are skipped. But if we iterate by index, we can skip if we want. Like this:
prf_sqrs = [1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144]
for i in range(1, len(prf_sqrs), 2):
print(prf_sqrs[i])
Here we print only the perfect squares from the list that have an odd index; and this (since indices begin at 0) is the squares of the evens.
We've learned enough about lists that we can now put them to work for us. Below are some simple tasks solved by means of lists. Please read them carefully.
Our first function takes a list of numbers and returns the greatest. (The function could of course be easily modified to find the least.) Here's a puzzle: Why did I choose L[0] - that is, the value at index 0 in L - as the initial value of max? Why not pick some value, like, say, 0?
def find_max(L):
# Find the greatest element of list L.
max = L[0]
for elem in L:
if elem > max:
max = elem
return max
Our second function takes a list of numbers and computes the average. Here the length function will be quite useful. Without it, we'd have to count the number of elements in L ourselves.
def find_avg(L):
# Find the average of the numbers in list L.
sum = 0
for elem in L:
sum += elem
return sum / len(L)
Our third function takes a list of lists of numbers - like perhaps [[1, 3, 12], [7, -2, 15, -21], [-5, 13]] - and returns the index of the sublist with the greatest average. We'll make use of the find_avg function written immediately above. Note that in this case, since we wish to return an index, we should iterate through the input list by index; that means we use range(len(L)), which gives us the list of indices of elements in L. Note too that we need two variables - one to keep track of the greatest-so-far average, and one to keep track of the index of the sublist with the greatest-so-far average.
def find_max_avg(L):
# L is a list of lists of numbers.
# Return the index of the sublist with the greatest average.
# The initial value of max is the average of the initial element of L.
max_avg = find_avg(L[0])
index_max = 0
for i in range(1, len(L)):
curr_avg = find_avg(L[i])
if curr_avg > max_avg:
index_max = i
max_avg = curr_avg
return index_max
The L[i] in curr_avg = find_avg(L[i]) is itself a list, since L is a list of lists. Indeed it is the sublist of L at index i.
(Here's a question for you: What if the list sent to find_max_avg had two sublists with the same average, and that average was greater than any other? The function would return the index of the first sublist with that greatest average. Why? Consider the test if curr_avg > max_avg. It's true only when we have a new average greater than any we've seen before. How might we make a simple change to find_max_avg so that it would return the index of the last sublist with the greatest average?)
We often traverse lists so that we might process their elements in some way. Let's have more examples of that. Let's traverse a list of positive integers and return the number of primes within it.
As always, we should think carefully about the task and break it up into its natural sub-tasks. It seems here that we have two sub-tasks: determine whether a given integer is prime, and count the number of elements in a list that have a certain property. So let's have two functions . The first will take an integer and return True if that integer is prime, False otherwise. The second will take a list and count how many elements of the list return True if passed to the first function.
def is_prime(n):
# Return True if n is prime, False otherwise.
import math
if n <= 1:
return False
for f in range(2, math.ceil(math.sqrt(n)):
if n % f == 0:
return False
return True
def count_instances(L):
# Return number of elements of list L for which is_prime returns True.
count = 0
for elem in lst:
if is_prime(L):
count += 1
return count
This is a pretty little piece of abstraction. Note that we could easily modify count_instances so that it counted the members of L that had some property other than being prime. All we'd have to do is write a function that tested whether an object had that property, and then modify count_instances so that it called that new function. (In a later chapter, we'll learn how to actually pass the function that does the test to count_instances as one of its arguments. That'll mean that we won't have to modify the body of count_instances!)
Let's call count_instances a few times:
>>> ints = [1, 6, 7, 3, 4, 11]
>>> count_instances(ints)
3
>>> count_instances([])
0
[1, 6, 7, 3, 4, 11] contains 3 primes. The empty list contains none.
We can grow lists; that is, we can add elements too them. Python gives us two ways to do this. The first way is with the + operator. A second is with the append method. Let's look at + first.
>>> a_lst = [1, 2, 3]
>>> a_lst + [4, 5]
[1, 2, 3, 4, 5]
Note that when we extend a list with +, the right operand ([4, 5] above) must be a list. Watch what happens if we forget that (as I have done many times):
>>> a_lst + 4
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only concatenate list (not "int") to list
+ concatenates list to list!
Note too that the line of code a_list + [4, 5] didn't change the value of a_lst. Watch:
>>> a_lst = [1, 2, 3]
>>> a_lst + [4, 5]
[1, 2, 3, 4, 5]
>>> a_lst
[1, 2, 3]
So what happened is that the concatenation a_lst + [4, 5] created a new list. It didn't alter the [1, 2, 3] list. We say in this case that no list was mutated.
What if we wanted to update a_lst so that it did contain all of 1 through 5? We'd do this:
>>> a_lst = a_lst + [4, 5]
>>> a_lst
[1, 2, 3, 4, 5]
The syntax here is the same as when we increment a number variable. If n has the value 1, then after n = n + 1, its value is 2.
Now let's consider the append method. It's our second way to grow a list. It, unlike list concatentation, does indeed mutate.
>>> b_lst = [1, 2, 3]
>>> b_lst.append(4)
>>> b_lst
[1, 2, 3, 4]
Note that b_lst is longer than it was to begin. It has been mutated! (More on this in a moment.)
Note as well the syntax here - list name dot append, and then the object to append in parentheses. As I've noted before, a function called in this way is called a "method"; and the object whose name appears before the dot is the first argument of the method. So the append method takes two arguments - a list and an object - and adds the object to the end of the list.
As I said, +, when used to grow a list, doesn't mutate any list, but append does mutate. Let's make sure we understand. I'll begin with a few more examples.
>>> a_lst = [1, 2, 3]
>>> b_lst = [4]
>>> a_lst + b_lst
[1, 2, 3, 4]
>>> a_lst
[1, 2, 3]
>>> b_lst
[4]
>>> c_lst = [1, 2, 3]
>>> c_lst.append(4)
>>> c_lst
[1, 2, 3, 4]
Remember that the console prints any return value from a function call. This means that + returns a value but append does not! Look at the line immediately after a_lst + [4]. There's no prompt. Instead it's the return value of the call to +. Look now at the line after c_lst.append(4). There is a prompt, so append did not return a value. (Well, actually this isn't quite the truth. Functions that don't specify a return value, like append, do in fact have a default return value. We'll come back to this in a moment.)
Note another difference. The + operator returned a new list but did not change either of its operands; a_lst and b_lst were still [1, 2, 3] and [4] respectively after the concatenation. The append method did not return a value but it did change the list on which it was called; after the call to append, c_lst had the value [1, 2, 3, 4].
We say that a function is fruitful when it returns a value, fruitless when it does not.
We say that a function is a mutator when it changes one of the objects sent to it, a non-mutator when it does not.
+ is thus a fruitful non-mutator, and append is a fruitless mutator.
(Could there be a function that is a fruitful mutator? Absolutely. We'll encounter one here in a bit.)
So lists are mutable. A list can remain the list it was (and all names of that list still refer to that list) even though we change its contents.
What are the ways we can mutate a list? Here are a few common ways. (The list is not complete. Consult Google.)
As we've seen, we can extend a list by means of the append method. Syntax: a_list.append(an_object).
>>> a_list = [1, 2, 3]
>>> a_list.append(4)
>>> a_lst
[1, 2, 3, 4]
Syntax: a_list.insert(index, object).
The insert method takes two arguments - an index and an object - and places the object at that index. Example:
>>> b_list = ['a', 'b', 'c', 'e', 'f']
Oopsie! Forgot the 'd'. Let's insert it.
>>> b_list.insert(3, 'd')
>>> b_list
['a', 'b', 'c', 'd', 'e', 'f']
Notice that when we inserted the 'd' at index 3, the elements from index 3 on got pushed one to the right.
We can insert at the end of a list if we wish. Indeed Python is quite kind. If the index we provide exceeds the index of the final element, we'll add to the end of the list.
>>> c_list = ['w', 'x', 'y']
>>> c_list.insert(12, 'z')
>>> c_list
['w', 'x', 'y', 'z']
Syntax: a_list.remove(element)
The remove() method takes a single element as an argument and removes it from the list. If more than one instance of element is found in the list, only the first is removed. If the element doesn't exist, it throws the ValueError: list.remove(x): x not in list exception.
Examples:
>>> d_list = [1, 3, 1, 5, 6]
>>> d_list.remove(1)
>>> d_lst
[3, 1, 5, 6]
>>> d_list.remove(1)
>>> d_list
[3, 5, 6]
>>> d_list.remove(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: list.remove(x): x not in list
Syntax: del a_list[index]
The del function deletes the element at a given index from a given list.
>>> e_list = ['s', 'p', 'a', 'm']
>>> del e_list[1]
>>> e_list
['s', 'a', 'm']
Syntax: a_list.pop() or a_list.pop(index). Let's have a few examples to begin. We'll discuss them after.
>>> f_list = [1, 2, 3, 4, 5, 6]
>>> f_list.pop()
6
>>> f_list
[1, 2, 3, 4, 5]
>>> f_list.pop(2)
3
>>> f_list
[1, 2, 4, 5]
f_list.pop() returned a value (the 6) and it also mutated the list (it was [1, 2, 3, 4, 5] after). pop is thus a fruitful mutator. Think of it as a hybrid. It returns a list as did +, and it mutates a list as did append.
If we place an integer i inside parentheses after pop, Python removes the element with index i from the list and returns that element.
We can place square brackets after a list expression that thereby modify a list. I expect you'll be able to make sense of the examples below without my help.
>>> g_list = [1, 2, 3, 4, 5, 6]
>>> g_list[0] = 7
>>> g_list
[7, 2, 3, 4, 5, 6]
>>> g_list[2:5] = [8, 9, 10]
>>> g_list
[7, 2, 8, 9, 10, 6]
>>> g_list[6] = 11
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range
Notice that when we use square brackets in this way to mutate a list, we write over elements that were already present.
Here's a typical example of the use of lists to build a sequence of all objects with a given property. Let's compile the list of all numbers between 1 and some given positive integer n that are divisible by all integers in a given list L. Of course we could wrap this up in a single function. But remember that we wish to make our functions as general and as simple as possible. General functions are reusable. Simple functions are easier to get right.
I suggest we split the task into two sub-tasks, each handled by its own function. The first will take a positive integer n and a list of positive integers L, and will return True if n is divisible by every integer in L, False otherwise. The second function will take a positive integer n and a list of positive integers L, and will return the list of all positive integers less than or equal to n for which the first function returns True.
Think of the first function as the tester function. Think of the second as the compiler function. Compiler takes an int and asks tester if that int passes the test. If so, compiler adds it to the list.
def is_divisible(n, L):
# Take a positive integer n and list of positive integers L,
# return True if n is divisible by each element of L, False otherwise.
for divisor in L:
if n % divisor != 0:
return False
return True
def compile_list(n, L):
# Take a positive integer n and a list of positive integers L,
# return the list of all ints from 1 to n for which is_divisible returns True.
passed = []
for curr_int in range(1, n + 1):
if is_divisible(curr_int, L):
passed.append(curr_int)
return passed
A few test runs:
>>> compile_list(100, [2, 3, 5])
[30, 60, 90]
>>> compile_list(300, [2, 3, 5])
[30, 60, 90, 120, 150, 180, 210, 240, 270, 300]
Mutation is powerful! In other languages, we cannot modify a list once it's been created; or if we can modify it, we cannot grow or shrink it but can only change the elements found at one or more positions. Python isn't like that! We can modify both list length and list occupants. Hurray for Python!
But with great power comes great responsibility. There are numerous pitfalls hereabouts, and if one doesn't understand what they are, one can write buggy code and have literally no idea why it's buggy. Below I discuss a number of the pitfalls.
So, some list methods bear no fruit but mutate a list instead. append, insert and remove are examples. Others both mutate a list and return a value. pop is our example of this.
But you must understand that those functions which do not specify a return value still return (the rather mysterious object) None. Indeed any function , even those you write, will return None if no return value is specified.
This can lead to mysterious bugs. Watch this:
>>> letters = ['a', 'b', 'c']
>>> letters = letters.append('d')
>>> letters
>>>
I copied this out of my console after I typed letters and then hit enter. Python didn't print letters! What's up? The problem is on the previous line - letters = letters.append('d'). append is a fruitless mutator, and as such its return value is None. Thus None was assigned to letters; and when an expression evaluates to None, Python's repl prints nothing to the screen. (If you want to see the None, you'd need to type print(letters). Try it. You'll see the None.)
What's the fix? The second line should be simply letters.append('d'). That mutates the list so that it now has a 'd' at the end.
We alias an object when we give it a second name. Below, the int 12 has been aliased.
>>> twelve = 12
>>> also_twelve = twelve
>>> twelve
12
>>> also_twelve
12
Why would we want to do that? Typically we wouldn't. Not in that way. But we do inevitably alias when we call a function. Here's a simple function:
def is_odd(n):
return n % 2 == 1
If we call that function, n becomes a name of the object passed to the function; and if that object had a name before it was passed, it is now aliased.
>>> twelve = 12
>>> is_odd(twelve)
False
Here the object 12 has the name twelve and (once the function is_odd has been called) also the name n.
What's the danger is this? None when an immutable object - an integer perhaps - is aliased. But when a mutable object is aliased ... that indeed can be dangerous! Here's a little function that will seem quite innocuous. It takes a list of integers and counts how many are thodd. (A thodd is an integer that leaves a remainder of 1 when divided by 3. 4 is thodd for example.)
def num_thodds(L):
count = 0
while len(L) > 0:
last = L.pop()
if last % 3 == 1:
count += 1
return count
The idea seems sound. pop off the final element of lst (which we know returns the last element of the list) and add 1 to count if that final element is thodd.
>>> some_ints = [1, 2, 3, 4, 5, 6, 7]
>>> num_thodds(some_ints)
3
The 3 is right. 1, 4 and 7 are thodds.
The danger here becomes clear if we add a second function that also runs through the list. Let's add a function that counts thodders. (A thodder is a positive integer that leaves a remainder of 2 when divided by 3.)
def num_thodders(L):
count = 0
while len(L) > 0:
last = L.pop()
if last % 3 == 2:
count += 1
return count
(Of course this isn't pretty code. Instead of write two such similar functions, we should abstract out their similarity into a function that counts the number of instances of some arbitrary property. But let's not worry about that here.)
num_thodders does seem to work.
>>> more_ints = [10, 11, 12, 13, 14, 15]
>>> num_thodders(more_ints)
2
Yep, 2 is right. 11 and 14 are thodders.
Now let's fall into a dark pit of despair. (That's hyperbolic no doubt, but it does capture how I've felt more than once in the past.)
>>> bigger_ints = [100, 101, 102, 103, 104, 105]
>>> num_thodds(bigger_ints)
2
>>> num_thodders(bigger_ints)
0
0! That's not right! 101 and 104 are thodders, so the count should be 2. What happened to num_thodders? It worked before, but now it doesn't. How in the world did it break?
The answer is that it didn't break. It did return the number of thodders in the list it was sent. But the list it was sent was empty!
>>> bigger_ints = [100, 101, 102, 103, 104, 105]
>>> num_thodds(bigger_ints)
2
>>> bigger_ints
[]
>>> num_thodders(bigger_ints)
0
How did that happen? How did a list that was populated become empty? num_thodds is the culprit here. We sent it the list named bigger_ints. num_thodds then gave that list the alias L; and when it mutated L with pop, that mutation was then a mutation in the list named bigger_ints. L and bigger_ints are names of the same list, and when that list is changed, they're now both names of a changed list.
The danger here is not simply that an object was aliased. Instead it was that a mutable object was aliased. When we alias a mutable object and then use one of its names to mutate the object, all names of the object will then name a mutated object.
I'm tempted to tell you to never alias a mutable data type. But that's extreme. Python is built to alias mutable data types. This always happens when we call a function and send a list.
If however you wished to avoid the automatic creation of an alias when you send a list to a function, you can send a copy of the list instead. Remember the easy way to copy a list: list_name[:]. Watch this (I assume that num_thodds and num_thodders are defined):
>>> bigger_ints = [100, 101, 102, 103, 104, 105]
>>> num_thodds(bigger_ints[:])
2
>>> num_thodders(bigger_ints[:])
2
>>> bigger_ints
[100, 101, 102, 103, 104, 105]
We didn't send bigger_ints to num_thodds or num_thodders. Instead we sent a second, distinct list. A copy list. Thus when we popped, we didn't modify the original list.
Let discuss one final pitfall of mutation. It involves iteration, specifically iteration in which we mutate the list through which we iterate.
Let's write a function that takes a list of words (which we will assume all consist only of letters) and mutates that list so that every word that does not begin with a capital letter is popped out. So, for instance, if we send it the list ['Franklin', 'curtis', 'Mason'], the 'curtis' should be removed.
Of course we'll need to iterate through the list. Let's try to do that with a for loop, as below:
def remove_lower(L):
# Take a list of string of letters and remove
# those whose first letter is not capitalized.
for i in range(len(L)):
if not L[i][0].isupper():
L.pop(i)
(Indeed, Python does have an isupper() function. It returns True if sent a letter that is uppercase, False if not.)
This does seem like the right idea: iterate through the list by index, and if a certain word does not begin with a capital letter, pop it out. However, the function will crash if it pops out a word that isn't at the end of L. Look at this:
>>> L = ['Franklin', 'curtis', 'Mason']
>>> remove_lower(L)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in remove_lower
IndexError: list index out of range
What happened? It might help if we slipped a print statement into the body of the loop, and had it print both the value of i and the length of L. Here's the output:
>>> remove_lower(L)
i = 0 , len(L) = 3
i = 1 , len(L) = 3
i = 2 , len(L) = 2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 6, in remove_lower
IndexError: list index out of range
Ah! Now we see! The length of L was altered, but Python had already locked in a loop that would take us up to a value of 2 for i. The moral seems clear: don't set up a for loop in which the number of iterations depends on the length of a list and then change the length of that list as you loop.
What's the solution? Loop with while instead of for. In a for loop, you're locked into a certain number of iterations when you begin. But with a while loop, you can repeatedly perform a test to determine whether you should loop again. Here's the new version:
def remove_lower(L):
# Take a list of string of letters and remove
# those whose first letter is not capitalized.
i = 0
while i < len(L):
if not L[i][0].isupper():
L.pop(i)
else:
i += 1
This one works like a charm!
>>> L = ['Franklin', 'curtis', 'Mason']
>>> remove_lower(L)
>>> L
['Franklin', 'Mason']
Here the length of L was found each time before the body of the loop was executed, and thus if L's length changed, we could respond to that change. Of course we had to handle the initialization and update of i ourselves. But that's a small price to pay for non-buggy code.
Do make sure you understand why the increment of i -that is, the i += 1 - is found in the else clause. If we did have to pop the element at index i, we don't want i to increment, because after the pop, what was at index i+1 is now at index i.
My introduction to lists is now complete. But before we're done, let's take a quick look at a list-like data type, the tuple.
Both lists and tuples are ordered sequences, and we may extract elements from each by use of indices within square brackets. But they are different in two regards:
A tuple is (typically) enclosed in parentheses. So for instance ('a', 'b', 'c') is a tuple that consists of three letters. We may if we wish also create this tuple with 'a', 'b', 'c'. So for instance the line of code return num_sqrs, num_cubes actually returns a tuple whose two elements are the values of num_sqrs and num_cubes.
Tuples are immutable. Once created, nothing can be added, nothing can be taken away, nothing can be replaced, nothing can be inserted.
Here's a little code to read.
>>> empty_tuple = ()
>>> empty_tuple
()
>>> one_element_tuple = 1,
>>> one_element_tuple
(1,)
>>> many_elements = (1, 2, 3)
>>> many_elements
(1, 2, 3)
>>> no_parens = 4, 5, 6
>>> no_parens
(4, 5, 6)
>>> type(no_parens)
<class 'tuple'>
>>> no_parens[1]
5
>>> len(no_parens)
3
>>> no_parens[0] = 7
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
Note:
If we wish to create a one element tuple, we must use a comma. 1, is a one element tuple. So too is (1,).
A comma after an expression or between expressions creates a tuple. 1, is a tuple (as we saw above). So too is 1, 2.
The attempt to mutate a tuple (no_parens[0] = 7, for example) will throw an error. Tuples are immutable.
Why use tuples instead of lists? After all, lists can do everything that tuples can do. And more! I use tuples when I know that I'll never need to change the tuple once created. Why use a data type with more power, and more danger, than I need?