Python Tips: Who Ate My Iterator?

Python Tips:  Who Ate My Iterator?
Photo by Hitesh Choudhary / Unsplash

I recently wrote a very simple script to read in a CSV file and print out the contents of the different columns.  Well, at least I thought it was going to be simple.  Instead of just sticking to what I needed the script to do, I did something foolish that had me going in circles.  I decided that I wanted to add an extra line to preview the contents of the CSV file before I separated out the different columns.  This amounted to printing out the contents of an iterator which, I didn't realize at the time, would change the behavior of the script.  But, I'm getting to the end and I first need to explain what an iterator is and why it matters.

What the heck is an iterator anyway?

An iterator is an object that contains a countable number of values.  An iterator is an object that can be iterated upon, meaning that you can traverse through all the values.

Python Iterators

Why do you need iterators?

In scripting you regularly find yourself looping through items in some data structure (lists, tuples, dictionaries, etc.).  You can traverse items in a data structure by using indices but that can become cumbersome and is prone to errors if you get the index math wrong.  Iterators solve this problem by implementing a __next__() function; this function keeps track of the element that was returned in the previous call and will only return the next element that hasn’t been accessed.  The benefit is that this function frees the developer from having to keep track of the current and next positions in the data structure.  In other words, it makes looping easier because it takes less code.  And, in programming, less is better.

What do you mean they are consumed?

When you an iterator to traverse through all the items in an iterable object, you “consume” it.  That is to say, an iterator can only be used once.  If you want to loop through the items in the same object again, you have to create a new iterator.

Why does it matter how many times you can consume an iterator?

If you don’t keep this in mind, you may accidentally consume the iterator and then try to reuse it (like I did).  That will have you scratching your head because your code looks solid but you can’t figure out why the iterator is not giving your results you expect.  For example, here is the script I wrote originally:

  1 #!/usr/bin/env python3
  2 
  3 import csv
  4 
  5 file_handle = open('users.csv', mode='r', encoding='utf-8-sig')
  6 
  7 contents = csv.DictReader(file_handle)
  8 print(list(contents))
  9 
 10 users = []
 11 email_addresses = []
 12 
 13 for row in contents:
 14     print(row)
 15     users.append(row['User'])
 16     email_addresses.append(row['Email'])
 17 
 18 print('Users:', users)
 19 print('Email addresses:', email_addresses)
 20 
 21 file_handle.close()

Note that on line 8 I have a command to print out the contents of the iterator object.  I hadn't realized that list(contents) would consume the iterator.  Consequently, when I tried to use the iterator in the for loop, the loop exited immediately because the iterator had no more items left to traverse.  This is what the output looked like:

% ./read_csv.py 
[{'User': 'Bubba', 'Email': 'bubba@bubbagump.com'}, {'User': 'Forest', 'Email': 'forest@bubbagump.com'}, {'User': 'Jenny', 'Email': 'jenny@bubbagump.com'}]
Users: []
Email addresses: []

While I certainly did get the contents of the iterator, note that the users and email_addresses lists are empty.  That's because the for loop never ran.  That had me scratching my head until I did some debugging and discovered that the for loop was returning the StopIteration exception.  That was a prime indicator that iterator had been consumed.  

What’s the fix?

I simply removed line 8 from the script and re-ran it.  Here is the output:

% ./read_csv.py  
{'User': 'Bubba', 'Email': 'bubba@bubbagump.com'}
{'User': 'Forest', 'Email': 'forest@bubbagump.com'}
{'User': 'Jenny', 'Email': 'jenny@bubbagump.com'}
Users: ['Bubba', 'Forest', 'Jenny']
Email addresses: ['bubba@bubbagump.com', 'forest@bubbagump.com', 'jenny@bubbagump.com']

Instead of printing out all contents before the for loop, I just print out each line one-at-a-time inside the loop.  The reality is I didn't need to have that extra print command on line 8, I was just being a bit paranoid.  Unfortunately, that paranoia cost me some time and headache.  I'm sharing this with you in the hopes that you won't make the same mistake.

Conclusion

Only consume an iterator it in the way you really need to.  In other words, don’t use the iterator outside of a for loop.  And do yourself a favor and take the time to really understand a language feature before you use it ;-).  Happy scripting.