[personal profile] flowerhack
I spent a little while digging around in CPython recently, and thought I’d share my adventure here. It’s a bit of a riff on Allison Kaptur’s excellent guide to getting started with Python internals—I thought it may be neat to show how my own explorations went, step-by-step, so that perhaps other curious Pythonistas might follow along.

1. Notice something weird happening.

Initially, I was just setting up Nose to run tests on some Python 3 code that I’d written. When I ran the tests, I got a mysterious "TypeError: bad argument type for built-in operation" message, which I hadn't seen in this program before.

The cause of the error ended up being a little obvious—I'd mistakenly left a PDB breakpoint (`import pdb; pdb.set_trace()`) in the program. When I removed that, the tests ran fine.

But, I've used Nose to run tests on Python 2 repos before, and in those cases, leaving in breakpoints by mistake didn't cause Nose to crash. Instead, the program would appear to "hang.” The program wasn't really hanging—it just wasn't displaying stdout (standard output). Nose does this on purpose, and it makes sense—if I'm running a test suite, I probably just want to see the results of the tests, and not a bunch of print statements from the program itself. If you hit "c" in this scenario, Nose simply continues past the breakpoint as usual.

Normally, I might've shrugged, removed the breakpoint, and continued with what I’d been working on. But! Since I'm at Hacker School and have time to dig into whatever captures my fancy, I decided I'd use this as an excuse to look at Python internals.

2. Make the simplest possible test case.

Turns out this issue was kind of tricky to dig into—I wasn't sure if the problem was in Nose, or in PDB, or in the CPython source itself. And, of course, I couldn't use any breakpoints, as those would just cause my program to crash.

Eventually, after testing some hypotheses, it seemed like the call to `input()` that PDB uses was where things were breaking. So: did something change in how input itself was implemented between Python 2 and Python 3, or was something else going on?

I was pair-debugging with Jesse when we finally noticed that Nose handles standard output in an interesting way:
self._buf = StringIO()
sys.stdout = self._buf
It turns out `sys.stdout` represents all standard output in Python—that is, anything that is printed to your terminal screen is sent here. But! Since we can access sys.stdout just like any other Python variable, we can change it. Here, Nose is setting sys.stdout to be StringIO(), which is just an arbitrary string.

When you do this, the print function no longer works!
>>> import sys, io
>>> sys.stdout = io.StringIO()
>>> print(“Hello”)
>>> # Oh no, nothing printed!
We wondered if that line might be the problem, so we set up a simple test case:
>>> import sys, io
>>> sys.stdout = io.StringIO()
>>> print("Hello!") # Nothing will appear
>>> input("Input: ") # Raises a TypeError
Running this in Python 3 gives you the "bad argument for built-in operation" we saw. So now we know where to look! When you try to change sys.stdout, the builtin function `input()` breaks in some strange way.

3. Learn you some CPython!

So, we’d like to look at how `input` is implemented. Python has a cool module called `inspect` that lets you examine source code like so:
>>> from collections import namedtuple
>>> import inspect; print(inspect.getsource(namedtuple))
def namedtuple(typename, field_names, verbose=False, rename=False):
      """Returns a new subclass of tuple with named fields.
      .....
If you try calling `inspect.getsource` on `input`, however, the result is “TypeError: is not a module, class, method, function, traceback, frame, or code object.” This means that our function is not implemented in Python—it’s implemented in C, and thus, the `inspect` module isn’t able to display its source code for us.

...but, with the magic of the cinspect module1, we can look at C source code!
>>> import cinspect; print(cinspect.getsource(input))
static PyObject *
builtin_input(PyObject *self, PyObject *args)
{
     PyObject *line;
     char *str;
.....
Awesome. Now we know that the function we want is called `builtin_input`. At this point, we’re going to start looking through C code, rather than just Python things, and we’ll be debugging at the terminal rather than at the Python interpreter. You don’t have to be a C expert to get a general idea of what’s going on—I’m mostly proceeding by making educated guesses based on function names :)

So, let’s grep through the CPython source code, and we’ll discover that `builtin_input` is a wrapper around `builtin_input_impl`, which is a method in bltinmodule.c. Let’s try loading Python into the lldb C debugger and setting a breakpoint at the beginning of that method2:
flowerhack$ lldb -- /Users/flowerhack/cpython/python.exe
flowerhack$ breakpoint set --file bltinmodule.c --line 2337
While stepping through the source code (the process is similar to what you might do in PDB—just keep hitting “n” to continue to the next line), we discover the bit of code where problems first appear:
stdout_encoding_str = _PyUnicode_AsString(stdout_encoding);
stdout_errors_str = _PyUnicode_AsString(stdout_errors);
if (!stdout_encoding_str || !stdout_errors_str)
     goto _readline_errors; // "throws" an exception
The third line tripped me up: “if the encoding string is null OR if the errors string is null, we have an error.” But wait, wouldn’t a null errors string imply NO errors were found?

For this, I dug into the definition of _PyUnicode_AsString (another C function):
#define _PyUnicode_AsString PyUnicode_AsUTF8
That’s just a macro which says “hey, when we’re calling _PyUnicode_AsString, call PyUnicode_AsUTF8 instead.” So, what we really want is the definition of PyUnicode_AsUTF8:
char*
PyUnicode_AsUTF8(PyObject *unicode)
{
     return PyUnicode_AsUTF8AndSize(unicode, NULL);
}
...and it seems all that is doing, is calling PyUnicode_AsUTF8AndSize, which is really what we want to read.

There’s several error cases in the PyUnicode_AsUTF8AndSize function, each of which return NULL. It seems odd to me that we’re returning NULL in the case of an error, instead of an error code like -1. Maybe there’s some convention here that I’m unfamiliar with?

Anyway, to figure out which error case I was hitting, I did “printf debugging”—I just added a printf statement before each possible error case, and ran the program—and was able to discover that we’re failing something called a PyUnicode_Check.

So, is that check something that wasn’t in Python 2 but now exists in Python 3? Well, we can compare the source code of the two versions to find out. And turns out, the Python 2 source makes no such encoding check, while the Python 3 source does—so, if sys.stdout is replaced with something that has the wrong encoding, it’ll fail in 3 but not 2. Whew!

4. Profit!

So this might look like a lot of work just to find out the why behind a pretty trivially fixable bug. And maybe it is, but! We learned some cool stuff along the way. I found out a lot about how standard input and output are handled by Python while I was testing hypotheses. I learned more about reading large, macro-heavy C projects. I learned that GOTO is still alive and well, which surprised me, but made sense in context—it seems like it’d be tricky to do something like an exception in C without GOTO. Also, it was really cool to read through the changes between bltinmodule.c’s input functions between Python 2 and 3—seriously, check that out; it’s neat to see how they refactored and cleaned up things.

I also stumbled on some super-interesting trivia about reference counting in Python, but I’m saving that for another post :)

(Also, many thanks to Leta who helped me edit a draft of this post!)

1 Disclaimer: cinspect is a little tricky to set up. The instructions in the project's README should work, but note that the "indexing your sources" step takes a long time.
2 If you've used gdb before, then you just need to know that lldb is very similar. If you've never used either before, they're a bit like PDB, but for debugging C code rather than Python code.