Oct. 7th, 2014

First, a follow-up on yesterday's lulz with the eBird data: I lied a bit when I said it was a tar file that was being troublesome; the initial download was a tar file, which decompressed to a few README-ish files and a gz file, but the actual trouble came about when I tried to decompress the gz file—which contains the actual data, and was causing the trouble.

I decided to see what gzip thought the size of the file should be when uncompressed, and, uh...

dhcp-0059526637-5b-99:ebd_relAug-2014 flowerhack$ gzip -l ebd_relAug-2014.txt.gz
         compressed        uncompressed  ratio uncompressed_name
         7232458369          2856865220 -153.2% ebd_relAug-2014.txt

Apparently gzip thinks my massive text file should be smaller once it's uncompressed??? (And definitely not >60GB like it tried to do?)

Read more... )