Uncategorized

zoomascii – it’s binascii with more zoom!

April 28, 2017 No Comments

Did a presentation on my new Python library zoomascii at the San Diego Python Meetup. Check out the slides.

Python Meetup Presentation – Disk Spooling!

March 24, 2017 No Comments

I gave a lightning talk at the SD Python Users Group meetup last night about adding disk-spooling to a Django query using generators and multi-processing. Check out the slides here:

Disk-based Spooling – with generators and multiprocessing!

Uncategorized

Remote Fix for a Busted Keyboard

March 15, 2016 No Comments

When I lived in New York, I was a volunteer for Big Brothers Big Sisters. One of the ways I helped out my little brother was by helping him keep his computer running – a Windows 7 PC that I put together for him. This has gotten harder now that I live on the west coast, but I still want to help him if I can. Typically when he has a problem I remote in with TeamViewer and fix it.

A week ago he wrote to me telling me his keyboard was broken. I figured he’d spilled something on it so I advised him to try another keyboard – I knew he had a spare. He told me that one was the same, and went into more detail – neither keyboard was completely broken, the windows key and media keys worked, but he couldn’t type any letters or numbers.

After several long sessions of debugging via TeamViewer I had the following symptoms:

Unable to type letters or numbers, but the keyboard otherwise worked.
Drivers were fine, devices appeared correct in Device Manager.
Switching to a PS/2 keyboard didn’t help.
The problem persisted in Safe Mode.
The visual keyboard worked and I could type when connected through TeamViewer.

I was about ready to give up when I thought to press him a little about what he was doing when the keyboard stopped working. Turns out he was trying to hack an online game – he hadn’t told me out of embarrassment I imagine. Now I had a pretty good idea what had probably happened – he’d run a downloaded hack that contained malicious code. I ran a few malware scanners and they didn’t find anything.

I did, however, have the hack itself, so out of complete desperation I opened it up in Emacs hexl-mode to take a look. It was a compiled Windows binary but there it was, hidden in among the compiled code:

System\CurrentControlSet\Control\Keyboard Layout

That looked like a registry key and sure enough it was! I loaded up regedit, found that key and deleted it, rebooted and he was typing again!

I’m writing this blog post for a couple reasons – 1) I’m super proud of figuring this out and 2) when someone else has a similar problem maybe Google will serve up this post and they’ll be saved a lot of trouble. I searched a lot and never saw any mention of this registry key!

Uncategorized

The Upload System of My Dreams

December 31, 2010 4 Comments

Sometimes I have a great job, a job where I get to do exactly what I want in exactly the way I want to do it. And what is it, you might ask, that I want to do? I want to build the perfect data upload system. Why? A few reasons:

It’s not an easy problem. Data upload is complicated by the fact that the most common format we support (CSV) isn’t even close to standardized. Line-endings, character sets and quoting are all likely to change. Since we can’t enforce much uniformity on the data we accept, our system has to be very flexible. Add to that the fact that supporting large volumes of data is also required and it’s got plenty of challenges.
Doing it badly is acutely painful for our clients. And when our clients feel pain they naturally pass it along to us. I’d say at least 10% of our support requests have been related in some way to our upload system.
It’s a great project to do test-driven development (TDD), which is my favorite way to work. Since data uploads are so deterministic it’s very easy to work on the problem in a straightforward TDD manner.
It requires a high level of parallelism to run quickly. Parallel programming is a fun challenge in itself and the payoff is great when it works.

I’m pretty happy with the way the project turned out. I made a screencast showing off some of the new front-end features:

ActionKit Upload Improvements on Vimeo.

It’s also my first attempt at screencasting. Eesh, my voice.

The frontend uses jQuery with jQote to get updates from the running upload job and update the status display. The progress bar is canvas-based and uses RGraph.

The backend code uses Celery to queue upload jobs from our Django front-end. The jobs themselves use a multiprocessing-based job pool system which we first developed for our mail sender, and has since been abstracted out as a reusable component by my co-worker Randall Farmer (could be worth releasing on PyPi at some point, it has some unique features). Stopping an upload early works by sending a message with Carrot, the underlying AMQP client used by Celery to talk to RabbitMQ.

I tried a new approach this time with regards to the way errors and status are handled by the workers. Instead of trying to report info up to the parent, each worker writes status and errors directly to the database. The parent can query the database to get updates on the workers. This will hopefully help avoid some of the deadlocking problems inherent in systems that rely on bi-directional communication between parents and workers. It also made building the front-end easier, the status reported by the workers was easy to turn into JSON and send up to the client for display.

All in all, a fun project. I hope it works as well in practice as it has during development.

What’s the best data upload system you’ve used? Written? Lived through?

Uncategorized

Python Makes Me Say God Damn

May 19, 2010 42 Comments

I’ve been coding in Python now for almost a year. Mostly it’s been great fun, but a few things continue to annoy me. Most of them are just a matter of taste, some related to my background as a Perl coder no doubt. In any case, I’m hoping writing about them will help me be mindful and might bring up some useful workarounds in response. And ranting is really its own reward.

NOTE: before you tell me I’m an idiot and Python is awesome, allow me to remind you that I like coding in Python and I’m quite sure I could produce a list like this for any language I’ve worked in. So, yes, Python is awesome and Python sucks. I’m sorry if I just blew your mind.

In no particular order:

The range() function is non-inclusive of the second term. If I say to you, give me the numbers from 1 to 5 are you going to say “1, 2, 3, 4”? Of course not. Even worse, Perl has trained me to expect inclusive ranges with the .. operator. So I constantly stub my toe on Python’s range(). It can lead to nasty bugs – sometimes counting one less than expected is obvious, and sometimes it isn’t!

I never know where to look for a method. Let’s say I’m having a hard time figuring out how to use some.awesome.Module’s foobar() method. I’ve checked the doc-string and it’s not helping, I need to use the source. At this point I’ve pretty much stopped trying to guess where it could be – I go straight to ack and start searching the tree under some/. It could be in some.py, some/__init__.py, some/awesome.py, some/awesome/__init__.py, etc. And if it’s a method with a common name – save(), for example – it can be very hard to figure out which save() is actually the one I’m looking for.

Python is happy to do nothing. Consider this code:

[sourcecode language=”python”]
foo = range(1,100)
while len(foo) > 10:
foo.pop
[/sourcecode]

It’s supposed to reduce foo until it’s got 10 elements (there are easier ways to do this, of course – not the point). Actually it loops forever because foo.pop is actually a reference to the pop method. To call it you must include parens:

[sourcecode language=”python”]
foo = range(1,100)
while len(foo) > 10:
foo.pop()
[/sourcecode]

It makes a lot of sense to me that this is the way it is. But couldn’t Python emit a warning when I do this? The code does absolutely nothing useful – the method reference is returned and immediately discarded. Perl does handle this case, emitting a warning about the use of a scalar in void context. I never thought I’d miss that warning, but now I do!

Strings are sequence types. At first glance this probably seems pretty harmless – you can iterate over the characters in a string. Probably useful, right? Well, not for me! I honestly can’t remember the last time I intentionally iterated over a string character by character in a language other than C. If I want to search a string I’ll use a regular expression or a call to index()/find() – faster and easier. So why does it irritate me? It leads to bugs, because it’s all too easy to accidentally put a string where a list should go and Python will then happily iterate over the string. For example, check out this bug (simplified a bit from the actual usage):

[sourcecode language=”python”]
for name, value in request.GET.items():
params[name] = value[0]
[/sourcecode]

The bug here is that value is a string, not an array of values. This bug resulted in each GET param getting truncated to a single character. And it completely escaped my attention because I tested it with parameters that were small numbers – 0, 1 and 5. So it got into live code that then failed when “2010-01-01” became “2”. So lame. I’d much rather Python said something useful like “You can’t index into a string dummy!”

Unicode support is a mess. This one really surprised me. I always thought Unicode in Perl was exceptionally bad, likely because it was always a second-class citizen. I thought for some reason that Python would have a better, saner system. Well, sadly no. In fact, it has pretty much all the same problems that Perl has. Consider this very common code:

[sourcecode language=”text”]
body=”””
Your mailing template for mailing %s has a syntax error:

“”” % ( mailing.id, exception)
[/sourcecode]

That’s the buggy version. It fails when the exception contains a non-ASCII character. There’s no way you can predict when this will be the case – if you could predict bugs that result in exceptions they wouldn’t be nearly as much fun, now would they? Here’s the fixed version:

[sourcecode language=”text”]
body=u”””
Your mailing template for mailing %s has a syntax error:

“”” % ( mailing.id, exception)
[/sourcecode]

Did you spot the difference between code that works great and code that will make you hate yourself when it fails to show you a simple exception? It’s a single ‘u’ in front of the string quotes. You, the Python programmer, are supposed to remember to put that character in front of strings that could someday contain an object with non-ASCII data in it (straight-up unicode strings work by magic, see here for details). You’re supposed to figure it out and tell Python because for some reason upgrading the string at runtime would be the wrong thing to do. Please, shoot me now.

Actually, just wait, you can shoot me the next time I have to debug a random Unicode failure in our code-base. They pop up regularly when some object that nobody thought could have non-ASCII in it happens to get a non-ASCII character. Obviously better testing would help, but some things, like the contents of exceptions, are pretty hard to predict!

So that’s my list – got one of your own? There’s nothing like a good rant, go for it.

Uncategorized

Emacs ups and downs

April 29, 2010 13 Comments

Every month or so I try to learn a new Emacs feature or extension – something beyond the usual buffer juggling and programming-language modes. Of course when you’re trying new things so frequently some of them are going to work better than others. Things I’ve tried that stuck:

Keyboard Macros – probably the first “advanced” Emacs feature I learned and I use it all the time. I don’t save and name my macros as often as I should though.
Tramp – tramp-mode allows me to run a fast local Emacs and edit files remotely with no setup required. Just open /server.example.com: and go. Underneath it uses SSH to access the files, and you can set it to use alternate methods (scp, sftp, rsync, etc). I do wish it was a little faster though, or multi-threaded so it didn’t block Emacs when saving over a slow link.
Bookmarks – I bind M-b to bookmark-jump and I use it all the time. I have bookmarks for each project I’m working on and I use them with tramp-mode to get me onto the appropriate server.
Yasnippet – a recent add, this is a module which provides a template system for Emacs. It comes with some useful boilerplate templates for various programming languages and you can easily add more. I use the class and def ones for Python periodically as well as ones I’ve added to set up a warn() call.
browse-kill-ring – super useful to be able to pull up the full kill-ring and search for what you need. I have my kill-ring set to hold 100,000 entries, so if I’ve killed it in the current session I can be pretty sure I can get it back!
auto-complete – mode-aware auto-completion. I’m still not sure this one is going to last, but it does help a lot sometimes, particularly when I’m coding deep in a Python file and I need to accurately type the name of an imported identifier from the top of the file. And I’m getting more used to hitting C-g when I need to keep what I’ve typed and not accept a completion. I think it’s most likely a keeper.

Of course, not every experiment is a success. Here’s a few notable recent ones that I’ve since abandoned:

ido-mode – I wanted to like this one. Sometimes it’s a big time-saver, quickly navigating to files I’m trying to open in just a few keypresses. But just as often I’d find myself fighting with it, particularly when trying to create new files or navigate up a few levels. Ultimately I decided that the benefits of having a file path that’s editable the same way as normal text is just too much to give up.
registers – this still seems like something I should be using. Surely the ability to remember locations and little bits of text and then replay them should come in handy. Alas, not often enough to actually remember the keystrokes on the rare occasions when I think to use them.
rectangular selections – again, potentially very useful but I don’t need it often enough to remember the bindings. It doesn’t help that the default binds are so verbose, possibly I could learn to love this feature if I rebound it.
Tags – I’ve setup TAG file generation for several projects now, and each time I use it for a while and then fall back to grep and ack. I think the way TAG searches work just doesn’t match the way I want to search for things – I want to quickly browse through a list of hits, not jump from file to file. Still, being able to jump from the use of a function directly to its definition seems like it should be very useful!

I’m always curious about how other people use Emacs – what features do you use most and what have you tried that didn’t work out?

Uncategorized

New release: onlinepayment v1.0.0

March 21, 2010 No Comments

I’ve finally gotten around to doing my first open-source Python release, v1.0.0 of onlinepayment:

http://pypi.python.org/pypi/onlinepayment/1.0.0

This module provides a wrapper around two payment processors so you can write code that works with both. It’s based on a Perl module which does the same thing – Business::OnlinePayment. It goes further than Business::OnlinePayment – providing error handling and, thanks to my co-worker Aaron Ross, recurring billing too.

This is also the first open-source code to result from ActionKit. I hope we’ll find other opportunities like this in the future, we’ve got a lot of useful code in the project.

Uncategorized

On Python Present and Perl Past

March 6, 2010 13 Comments

I’ve been working in Python almost exclusively for the last 8 or 9 months. It’s been a fun challenge learning a new language, and being able to do it along with the rest of the We Also Walk Dogs crew has made it even better. A dip back into Perl has given me a chance to reflect on my progress with Python.

The past couple weekends I’ve been helping a friend with a small project – a textual analysis problem finding similarities between disparate documents in a large database. I immediately reached for Perl because I’ve done projects like this in Perl before and I knew all the tools I’d need. I indexed phrases from the docs using Digest::MD5, storing the index in MySQL with DBD::mysql. Then I whipped up a quick web-app with CGI::Application and HTML::Template, with a bit of help from HTML::FillinForm, Blueprint CSS and Config::General. With the exception of Blueprint (which I find indispensable these days), this is pretty much the toolkit I learned (and helped build) at Vanguard Media working for Jesse Erlbaum over ten years ago! It all worked great and the app was up and running in just a day and a half.

This experience makes it obvious to me that I still have a long way to go with Python. Perl syntax is second nature to me – I almost never make an error and everything works the way I expect it to. I’m not looking up basic Python syntax anymore but I am still making plenty of mistakes. More to the point, the tools I needed to use for this project are all still completely ingrained in my memory, and behave exactly the way I expect them to. Compared to the Python tools I use daily (Django, Celery and MySQLdb, for example), the difference is really impressive. My Python tools often surprise me and I find myself going back to the docs, and failing that the source, frequently.

It’s also interesting to think about how little has changed in the past 10 years. I can pick up the same tools I used then and construct something that most people would recognize as a modern web app. Mix in just a little Jquery and it would probably pass for Web 2.0. On the other hand, I think I can say I’ve gotten better as I’ve aged – this project would have taken me quite a bit longer 10 years ago, if I could have completed it at all. I probably would have gotten stuck on some completely insane plan like loading all the documents into memory at once. I really didn’t know how to properly use a database back then!

I am looking forward to making my first open-source Python release soon. Who knows, maybe there’s a book about writing Python packages for PyPi in my future!

Uncategorized

My Setup

February 13, 2010 2 Comments

I was reading through an interview series called The Setup recently. Nerds are asked to describe their gear, both hardware and software, and then what their dream version would be. Fun stuff – highlights for me — RMS, Aaron Swartz (one of our clients for ActionKit!) and MÃ¡irÃn Duffy (one of the few Linux users profiled aside from RMS). It’s all very Apple heavy, but still interesting to think about how different people setup their kit for fairly similar tasks. It made me want to put in my two cents, but I don’t think I’m famous enough to rate a slot on the site, so…

Who are you, and what do you do?

I’m Sam Tregar and I spend most of my time coding for We Also Walk Dogs, working with our many progressive political and non-profit clients. I work mostly in Python and Perl.

What hardware are you using?

My main work machine is a three year old Thinkpad T60p – a 15.4″ widescreen model with a very sharp 1680×1050 resolution, 2GB ram and a 2Ghz Core 2 Duo CPU. I recently replaced the hard drive with a fast SSD which was a huge upgrade, equivalent to getting a whole new machine for much less money. I keep the trackpad turned off – my typing style is so right-handed that I frequently palm-over the trackpad – and use the trackpoint exclusively. I’ve been using Thinkpads for a while now and it really comes down to the keyboard – all the keys are in the right place, and all the keys are big enough for my sausage-esque fingers.

At the office I used to have a sweet little Shuttle KPC machine with a Celeron 450 and 2GB of ram. Then about 3 months past its one year warranty it stopped working. So now I’m using my wife’s discarded four year old Dell laptop – a 2.2Ghz Core 2 Duo with 1GB of ram. It used to have 2GB before one of the ram slots mysteriously turned bad. I have it hooked up to a 21″ widescreen LCD running at 1680×1050. My keyboard is a Keytronic Lifetime with the all-important classic IBM layout, just like the Thinkpad. I use a Kensington Expert “Mouse” which is actually a trackball. It’s nice but I really miss the trackballs Logitech used to make.

I’ve also got a custom-built gaming rig. It’s an AMD Phenom II BE450 with 4GB of ram and an Nvidia GeForce 216 GPU. Years ago I put out the cash for a fast WG Raptor 10k disk, but I think about replacing it with an SSD frequently. It’s in a fancy low-noise case, the Antec P180.

Rounding out the local network is a Shuttle KPC operating as a storage server and home for occasional personal-use web apps, like the app I use to tell me when to leave the house to make a train. It’s got a Celeron of some type, 2GB of ram and 1TB of storage. I’m hoping it doesn’t suffer a similar fate to my late office model.

Oh, and I get my ass out of bed with a Chumby, the best Linux-powered alarm clock ever.

And what software?

I’ve been using Linux as a desktop OS since I was 15 years old – Slackware v1.1.2 on my 386DX40. I consider myself extremely lucky to have grown up with Linux and to be able to use it both on my machines and on virtually every server I work on (once in a blue moon I’ll work on a BSD or Sun box). My distro of choice is currently Fedora – I’m running Fedora 11 on most of my machines, but my office laptop has Fedora 12. The only machine I don’t run Linux on is the game machine, which is running Windows 7.

Far and away the most important app for me is Gnu Emacs. I’ve been using Emacs for around 13 years now and I’m still learning new things. I have a 400+ line .emacs file with tons of custom functions, many of which I use daily. Every so often I play with a new editor but I honestly can’t imagine working without Emacs.

The other two programs I use when I code are Chrome (I’m a recent convert from Firefox, so I still flip back now and again) and Gnome Terminal, invariably connected to one or more screen sessions. I read my email with Gmail, and I use Pidgin for IM and IRC.

What would be your dream setup?

I hate synchronizing things. I want to be able to sit down at any machine and have all my customized setup magically available. I’ve got some of this now between Xmarks for Chrome+Firefox and a private Subversion repository for my dot-files, but there’s a lot that’s not accounted for. I’ve standardized all my hardware on a common screen resolution just to make this easier (1680×1050) – same (gigantic) font sizes, identical Gnome layout, etc.

Another frequent wish is that I could have a full-size keyboard, trackball and screen available while I’m working in my living room. I actually achieved this during college with a hilariously dangerous monitor arm attached to a futon. Imagine sitting with a 20″ CRT directly above your lap, the monitor arm and futon groaning audibly under the weight. Someday I may recreate that beautiful setup with a lightweight LCD. Ideally the screen would just float in front of me, held up by fairies. And hey, since we’re headed in that direction, drop the keyboard and trackball – I’ll just control it by telepathy. This is going to be great.

I always want a faster connection – lower latency and more bandwidth. It’s a mark of newfound fiscal responsibility that I haven’t yet ordered Optimum Ultra, the ultra-fast, ultra-expensive data plan from my cable company. If/when they drop the $300 install fee I’ll probably do it.

Uncategorized

Beyond All Reason

December 8, 2009 No Comments

So, I’ve got a blog.Â The reason is pretty simple – I can’t find anyone who will pay for my tech writing.Â Every site and magazine that ever paid for my work is essentially out of business.Â Aside from the terrible economy in general I suspect that what’s really killed the market for my writing is people writing for free on their blogs.Â Â So, if you can’t beat ’em, join ’em.

This isn’t really my first blog though – I used to write semi-regularly in my journal at perl.org but that doesn’t seem worth continuing.Â I no longer program in Perl exclusively – my latest project is all Python and I spend at least 90% of my day-job hours on it.Â Much like my Perl journal I intend to keep this blog mostly technical – if you want to know about my cats you’ll have to find me on Facebook.

Next up, an exciting article about reverse-engineering precinct maps during the 2008 election!