The Upload System of My Dreams

Sometimes I have a great job, a job where I get to do exactly what I want in exactly the way I want to do it. And what is it, you might ask, that I want to do? I want to build the perfect data upload system. Why? A few reasons:

  • It’s not an easy problem. Data upload is complicated by the fact that the most common format we support (CSV) isn’t even close to standardized. Line-endings, character sets and quoting are all likely to change. Since we can’t enforce much uniformity on the data we accept, our system has to be very flexible. Add to that the fact that supporting large volumes of data is also required and it’s got plenty of challenges.
  • Doing it badly is acutely painful for our clients. And when our clients feel pain they naturally pass it along to us. I’d say at least 10% of our support requests have been related in some way to our upload system.
  • It’s a great project to do test-driven development (TDD), which is my favorite way to work. Since data uploads are so deterministic it’s very easy to work on the problem in a straightforward TDD manner.
  • It requires a high level of parallelism to run quickly. Parallel programming is a fun challenge in itself and the payoff is great when it works.

I’m pretty happy with the way the project turned out. I made a screencast showing off some of the new front-end features:

ActionKit Upload Improvements on Vimeo.

It’s also my first attempt at screencasting. Eesh, my voice.

The frontend uses jQuery with jQote to get updates from the running upload job and update the status display. The progress bar is canvas-based and uses RGraph.

The backend code uses Celery to queue upload jobs from our Django front-end. The jobs themselves use a multiprocessing-based job pool system which we first developed for our mail sender, and has since been abstracted out as a reusable component by my co-worker Randall Farmer (could be worth releasing on PyPi at some point, it has some unique features). Stopping an upload early works by sending a message with Carrot, the underlying AMQP client used by Celery to talk to RabbitMQ.

I tried a new approach this time with regards to the way errors and status are handled by the workers. Instead of trying to report info up to the parent, each worker writes status and errors directly to the database. The parent can query the database to get updates on the workers. This will hopefully help avoid some of the deadlocking problems inherent in systems that rely on bi-directional communication between parents and workers. It also made building the front-end easier, the status reported by the workers was easy to turn into JSON and send up to the client for display.

All in all, a fun project. I hope it works as well in practice as it has during development.

What’s the best data upload system you’ve used? Written? Lived through?

Continue Reading

Emacs ups and downs

Every month or so I try to learn a new Emacs feature or extension – something beyond the usual buffer juggling and programming-language modes. Of course when you’re trying new things so frequently some of them are going to work better than others. Things I’ve tried that stuck:

  • Keyboard Macros – probably the first “advanced” Emacs feature I learned and I use it all the time. I don’t save and name my macros as often as I should though.
  • Tramp – tramp-mode allows me to run a fast local Emacs and edit files remotely with no setup required. Just open /server.example.com: and go. Underneath it uses SSH to access the files, and you can set it to use alternate methods (scp, sftp, rsync, etc). I do wish it was a little faster though, or multi-threaded so it didn’t block Emacs when saving over a slow link.
  • Bookmarks – I bind M-b to bookmark-jump and I use it all the time. I have bookmarks for each project I’m working on and I use them with tramp-mode to get me onto the appropriate server.
  • Yasnippet – a recent add, this is a module which provides a template system for Emacs. It comes with some useful boilerplate templates for various programming languages and you can easily add more. I use the class and def ones for Python periodically as well as ones I’ve added to set up a warn() call.
  • browse-kill-ring – super useful to be able to pull up the full kill-ring and search for what you need. I have my kill-ring set to hold 100,000 entries, so if I’ve killed it in the current session I can be pretty sure I can get it back!
  • auto-complete – mode-aware auto-completion. I’m still not sure this one is going to last, but it does help a lot sometimes, particularly when I’m coding deep in a Python file and I need to accurately type the name of an imported identifier from the top of the file. And I’m getting more used to hitting C-g when I need to keep what I’ve typed and not accept a completion. I think it’s most likely a keeper.

Of course, not every experiment is a success. Here’s a few notable recent ones that I’ve since abandoned:

  • ido-mode – I wanted to like this one. Sometimes it’s a big time-saver, quickly navigating to files I’m trying to open in just a few keypresses. But just as often I’d find myself fighting with it, particularly when trying to create new files or navigate up a few levels. Ultimately I decided that the benefits of having a file path that’s editable the same way as normal text is just too much to give up.
  • registers – this still seems like something I should be using. Surely the ability to remember locations and little bits of text and then replay them should come in handy. Alas, not often enough to actually remember the keystrokes on the rare occasions when I think to use them.
  • rectangular selections – again, potentially very useful but I don’t need it often enough to remember the bindings. It doesn’t help that the default binds are so verbose, possibly I could learn to love this feature if I rebound it.
  • Tags – I’ve setup TAG file generation for several projects now, and each time I use it for a while and then fall back to grep and ack. I think the way TAG searches work just doesn’t match the way I want to search for things – I want to quickly browse through a list of hits, not jump from file to file. Still, being able to jump from the use of a function directly to its definition seems like it should be very useful!

I’m always curious about how other people use Emacs – what features do you use most and what have you tried that didn’t work out?

Continue Reading