I spent a little while digging around in CPython recently, and thought I’d share my adventure here. It’s a bit of a riff on Allison Kaptur’s excellent guide to getting started with Python internals—I thought it may be neat to show how my own explorations went, step-by-step, so that perhaps other curious Pythonistas might follow along.

Read more... )
I was reading Allison's blog post on how to start exploring Python internals, and one of the suggestions was: try implementing a Python library function without looking at it! I thought this sounded like splendid fun; also, one of the suggestions was namedtuple and I actually REALLY LIKE namedtuple but don't have occasion to use it often enough. So I dove in! Stuff I learned so far doing this:
  • Metaclasses! I already knew about these in a vague "it's like a thing that creates classes or something" sort of way, and since it seems like namedtuple creates class-like objects, I thought it'd be a good place to start. Probably the most interesting thing I discovered: the plain old type method, which I've always used just to check the types of objects, can also be used to dynamically create new classes! This seems like a super-odd and unintuitive dual functionality, and I found a throwaway comment that claimed this was due to historic/backwards compatibility reasons, but I wasn't able to determine what these reasons were. (Let me know if you know!)

  • With type() alone, you can create a pretty decent named tuple, which I coded up like so. Granted, it's (a) not a tuple at all, and (b) does some slightly frownyface manhandling of class properties, and (c) doesn't implement all the functionality of namedtuple... BUT, it does handle my most common use case for namedtuple, which tends to be: "Hey, I want a kind-of-throwaway class that'll be used only in a small section of the code—but that throwaway class will make what I'm doing SO MUCH MORE READABLE." Thus, tada! Instant objects with sensible properties!

  • But for some reason I got to wondering: could you make a function that, say, knows to simply create a Foo when you call namedtuple('Foo', 'my properties'), rather than having to do Foo = namedtuple('Foo', 'my properties')? It turns out the answer is YES, but you have to do evil things to make it happen. Essentially, Python maintains dictionaries of variables for you—try typing globals() or locals() into your Python interpreter to see!

    In order to auto-generate our Foo class, then, we want to add Foo to the local variable dictionary of the caller. (Meaning: if we're calling namedtuple('Foo', 'my properties') within our main method, we want Foo to be created in that main method, not just within the namedtuple call.) Turns out there's a _getframe function you can use to get, say, the current frame, or the parent frame... and then just tack Foo onto the parent frame and you're good to go!

    But that's all a terrible idea and you shouldn't do it. It's not good for you. It's not good for the planet. Don't be like me.
I've got an actual, good-for-the-planet implementation of namedtuple underway, so hopefully I can share a real gist of that with you all soon!

Edit: Ned pointed out that the super(self.__class__, self).__init__() call I had in my init functions for the janky and trolly tuples wasn't quite right—calling super on our hand-rolled class gets us a NoneType, so it doesn't really make sense to call it. I updated the code to be more correct now. Thanks, Ned!
Crypto challenge update: I can now decrypt repeating-key XOR and detect ECB encryption, woohoo! Now that I'm done with the first "set" of challenges, though, I think I'll take a bit of a break—they're super fun, and I'll come back to them later, but I want to start pairing more and explore some other things, too.

Tonight there was a round of presentations from other Hacker Schoolers and goodness they were awesome. Highlights included: Allison poking around to see how the recursion limit is implemented in Python and discovering amusing details therein, Eunsong's Javascript-based molecular dynamics simulator, and Tanoy demonstrating both his live coding skills and his excellent taste in music by making a Jekyll blog and dropping it on Digital Ocean in less than the amount of time it takes to listen to one rap song.

To wind down this evening, I wanted to dust off my old Heroku account and deploy a Flask app there (I've been trying to move some things off my Linode, and this seemed like an easy one to handle), and ran into a bunch of annoyances with key management. The first key I tried to give Heroku was rejected because "that's already being used by another Heroku account," which suggests I've got yet another account on the internet I've forgotten about, oops. The second key I used authenticated fine, but I couldn't push to git—since my git is configured with a different key—so I had to edit a file in .ssh/config, but the change didn't seem to be helping, and eventually I figured out that I had both an id_rsa and an id_dsa key, and I was referencing the wrong one. Sigh, key management. Hopefully I won't forget about the existence of this Heroku account too, heh.
Alas, this post is late—I left my computer at Hacker School last night and thus couldn't post until I got back this morning. But I'm talking about what I did during Day 3 so this still counts as blogging every day, right?

Anyway! I got some real headway in the crypto challenges, which was satisfying, though, as one might expect, it turns out twiddling bits in Python is rather annoying compared to something like C. Python tries very, very hard not to let you operate on raw bits, so you end up doing a lot of awkward conversions. Like, for the task of "this hex string has been XOR'd against a single character; figure out what character that is," I would up with some code that looked like this...
[chr(ord(byte) ^ key) for byte in hex_str.decode("hex")]
...which is (1) decoding the hex string, (2) reading that one byte at a time, (3) XOR'ing the value of the byte against the key, and (4) converting that back to a character representation. I wound up fumbling a bit getting those conversions nested correctly... I'm hoping to think of a more "systematic" way of handling these soon, maybe like a unicode sandwich for bit-twiddling. Or I could just convert everything to bitarrays and handle the problems that way; we'll see.

I also spent the afternoon reacquainting myself with my faltering early attempt at implementing Raft in Python, which was last updated, uh, seven months ago. I hadn't realized I'd left it abandoned for so long! Definitely hoping to wrap that project up (or maybe just start over from scratch) before I leave New York...
First, a follow-up on yesterday's lulz with the eBird data: I lied a bit when I said it was a tar file that was being troublesome; the initial download was a tar file, which decompressed to a few README-ish files and a gz file, but the actual trouble came about when I tried to decompress the gz file—which contains the actual data, and was causing the trouble.

I decided to see what gzip thought the size of the file should be when uncompressed, and, uh...

dhcp-0059526637-5b-99:ebd_relAug-2014 flowerhack$ gzip -l ebd_relAug-2014.txt.gz
         compressed        uncompressed  ratio uncompressed_name
         7232458369          2856865220 -153.2% ebd_relAug-2014.txt


Apparently gzip thinks my massive text file should be smaller once it's uncompressed??? (And definitely not >60GB like it tried to do?)

Read more... )
I decided I'd like to try and blog every day while I'm at Hacker School. This will make my blog updates a bit spammier than I normally like, but it also seems like a fun way for me to track my own progress and share what I'm up to with various interested parties, so!

What I'll be working on! )

What I worked on today! )