Python Makes Me Say God Damn

May 19, 2010 42 Comments

I’ve been coding in Python now for almost a year. Mostly it’s been great fun, but a few things continue to annoy me. Most of them are just a matter of taste, some related to my background as a Perl coder no doubt. In any case, I’m hoping writing about them will help me be mindful and might bring up some useful workarounds in response. And ranting is really its own reward.

NOTE: before you tell me I’m an idiot and Python is awesome, allow me to remind you that I like coding in Python and I’m quite sure I could produce a list like this for any language I’ve worked in. So, yes, Python is awesome and Python sucks. I’m sorry if I just blew your mind.

In no particular order:

The range() function is non-inclusive of the second term. If I say to you, give me the numbers from 1 to 5 are you going to say “1, 2, 3, 4”? Of course not. Even worse, Perl has trained me to expect inclusive ranges with the .. operator. So I constantly stub my toe on Python’s range(). It can lead to nasty bugs – sometimes counting one less than expected is obvious, and sometimes it isn’t!

I never know where to look for a method. Let’s say I’m having a hard time figuring out how to use some.awesome.Module’s foobar() method. I’ve checked the doc-string and it’s not helping, I need to use the source. At this point I’ve pretty much stopped trying to guess where it could be – I go straight to ack and start searching the tree under some/. It could be in some.py, some/__init__.py, some/awesome.py, some/awesome/__init__.py, etc. And if it’s a method with a common name – save(), for example – it can be very hard to figure out which save() is actually the one I’m looking for.

Python is happy to do nothing. Consider this code:

[sourcecode language=”python”]
foo = range(1,100)
while len(foo) > 10:
foo.pop
[/sourcecode]

It’s supposed to reduce foo until it’s got 10 elements (there are easier ways to do this, of course – not the point). Actually it loops forever because foo.pop is actually a reference to the pop method. To call it you must include parens:

[sourcecode language=”python”]
foo = range(1,100)
while len(foo) > 10:
foo.pop()
[/sourcecode]

It makes a lot of sense to me that this is the way it is. But couldn’t Python emit a warning when I do this? The code does absolutely nothing useful – the method reference is returned and immediately discarded. Perl does handle this case, emitting a warning about the use of a scalar in void context. I never thought I’d miss that warning, but now I do!

Strings are sequence types. At first glance this probably seems pretty harmless – you can iterate over the characters in a string. Probably useful, right? Well, not for me! I honestly can’t remember the last time I intentionally iterated over a string character by character in a language other than C. If I want to search a string I’ll use a regular expression or a call to index()/find() – faster and easier. So why does it irritate me? It leads to bugs, because it’s all too easy to accidentally put a string where a list should go and Python will then happily iterate over the string. For example, check out this bug (simplified a bit from the actual usage):

[sourcecode language=”python”]
for name, value in request.GET.items():
params[name] = value[0]
[/sourcecode]

The bug here is that value is a string, not an array of values. This bug resulted in each GET param getting truncated to a single character. And it completely escaped my attention because I tested it with parameters that were small numbers – 0, 1 and 5. So it got into live code that then failed when “2010-01-01” became “2”. So lame. I’d much rather Python said something useful like “You can’t index into a string dummy!”

Unicode support is a mess. This one really surprised me. I always thought Unicode in Perl was exceptionally bad, likely because it was always a second-class citizen. I thought for some reason that Python would have a better, saner system. Well, sadly no. In fact, it has pretty much all the same problems that Perl has. Consider this very common code:

[sourcecode language=”text”]
body=”””
Your mailing template for mailing %s has a syntax error:

“”” % ( mailing.id, exception)
[/sourcecode]

That’s the buggy version. It fails when the exception contains a non-ASCII character. There’s no way you can predict when this will be the case – if you could predict bugs that result in exceptions they wouldn’t be nearly as much fun, now would they? Here’s the fixed version:

[sourcecode language=”text”]
body=u”””
Your mailing template for mailing %s has a syntax error:

“”” % ( mailing.id, exception)
[/sourcecode]

Did you spot the difference between code that works great and code that will make you hate yourself when it fails to show you a simple exception? It’s a single ‘u’ in front of the string quotes. You, the Python programmer, are supposed to remember to put that character in front of strings that could someday contain an object with non-ASCII data in it (straight-up unicode strings work by magic, see here for details). You’re supposed to figure it out and tell Python because for some reason upgrading the string at runtime would be the wrong thing to do. Please, shoot me now.

Actually, just wait, you can shoot me the next time I have to debug a random Unicode failure in our code-base. They pop up regularly when some object that nobody thought could have non-ASCII in it happens to get a non-ASCII character. Obviously better testing would help, but some things, like the contents of exceptions, are pretty hard to predict!

So that’s my list – got one of your own? There’s nothing like a good rant, go for it.

Previous Post Next Post

42 Comments

James Simmons says:

May 19, 2010 at 6:08 pm

Hi Sam,

I’m with you on the Unicode support. Crawling the Web exposes you to a phenomenal number of encoding issues so this has been my #1 issue with the language. Still, Python is my favorite language to code in.
Daminkz says:

May 19, 2010 at 6:30 pm

I’m sorry, but all the ‘bugs’ you are complaining about, are not in fact bugs. They are features, and easy to work with, in my opinion.
Sam Tregar says:

May 19, 2010 at 6:36 pm

@Daminkz Where did I say they were bugs?
James Simmons says:

May 19, 2010 at 6:46 pm

@Daminkz

In the link to Stack Overflow that he provided there was a quote from the Python 3.0 change log that states:

Everything you thought you knew about binary data and Unicode has changed.
[…]
* As a consequence of this change in philosophy, pretty much all code that uses Unicode, encodings or binary data most likely has to change. The change is for the better, as in the 2.x world there were numerous bugs having to do with mixing encoded and unencoded text.
anon says:

May 19, 2010 at 7:10 pm

Moron writes buggy program then complains when it does not work. News at 11.
Daniel says:

May 19, 2010 at 8:07 pm

Yes, the non-inclusive ranges() annoyed me very much… not intuitive at all, at least for me.
John says:

May 19, 2010 at 8:14 pm

range is elegant because range(10) iterates ten times. Being inclusive on both ends would feel intuitively wrong.
Michael says:

May 19, 2010 at 9:31 pm

If these are the worst criticisms of Python, then we ought to give it some kind of award. Especially given that most experienced Py programmers would regard at least half of them as non-problems.
Y.H.Wong says:

May 19, 2010 at 11:16 pm

1.
The python range() function is intended to be used like this:

for i in range(5):
print i

0
1
2
3
4

If you do a help(range), it’ll tell u if there’s only 1 param, it will default to start at 0 and ends at j-1. The reason for this is the same as in C for(i = 0; i < 10; i++), the 10 tells you how many times to loop, and the index i starts at 0 because everything in computer science starts at 0. I don't see the merit in this complaint at all.

2. You've been programing in Python for a whole year and never heard of help() and dir() in the interpreter?

3. I don't know of any other language other then Perl, Ruby and SML where the () is optional in a function call. I'd argue that () makes a function call explicit, which is nice because a function call is not a property. If you want to treat a function as a property, you can use the property() function.

4. I have no idea what you are talking about here. It has probably more to do with the library you are using rather then Python's design:

Say you have this:

d = {'a': 'abc', 'b':'def'}
for k, v in d.items():
print k, v

a abc
b def

I mean, when you are iterating a list of key-value pairs in a dict, and when the value type is a string, what do you expect to get back out when you do value[0]? This is really your problem.

5. This has been a known problem for a long time that is fixed in Py3k. But then again, compared to other common dynamic languages like Perl, PHP and Ruby, at least Python HAS a built-in Unicode type. I'd still use Python as opposed to anything else.

I seriously don't see any merit in any of your complains.
Sam Tregar says:

May 19, 2010 at 11:37 pm

@Y.H.Wong
1. Whatever. I see tons of use of the two-argument form of range(), so it’s not just me!

2. Neither tells you where to find the source for a method. At best help() will give you a place to start, but often the function will have been imported from elsewhere. Maybe you need to learn more about them?

3. You’ve got me – I’m a native Perl programmer.

4. Yes, this whole post is about MY problems! I imagine I’m not completely unique, but who knows?

5. Python’s unicode support in v2.6.1 is in no way better than Perl’s, in my experience. Perl also has a built-in unicode type, and it sucks just as much to use as Python’s. I hope things really are better in v3, but it will be a while before I find out.
Brett says:

May 20, 2010 at 1:43 am

2 – I can agree on this, it can sometimes be quite difficult to find the source. Impossible if it’s compiled. Although, you *could* find what you’re looking for if you can find an __file__ var somewhere in the module’s hierarchy?
Y.H.Wong says:

May 20, 2010 at 1:56 am

Haha. It’s really interesting to see how different people think coming from different background. Maybe this list of tools will help you:

http://rgruet.free.fr/PQR26/PQR2.6.html
http://rope.sourceforge.net/ropemacs.html
http://virtualenv.openplans.org/

Using virtualenv with ropemacs together is really nice because you can configure rope to anchor the .ropeproject file at the top level and rope will index all the libraries u’ve installed under the virtualenv. Once that’s done, u can do C-c g to jump to the source definition from your reference.

Try them out and see if they make a difference!
Nathan Reynolds says:

May 20, 2010 at 2:21 am

>>> from decimal import Decimal
>>> Decimal(‘1’) > 2
False
>>> Decimal(‘1’) > 2.0
True

Yeah so that’s irritating.
Pieter Witvoet says:

May 20, 2010 at 2:34 am

For #2, foobar.__code__.co_filename and foobar.__code__.co_firstlineno should be helpful.
Vil says:

May 20, 2010 at 2:54 am

Re: #2, if you want to find out where the source for module A comes from, print out A.__file__. Much easier to find out in Python than any other language I’ve used so far!

Personally I think strings being sequence types is one of the best things about them. Iterating over them isn’t that useful (though it does occasionally come in handy); it’s the other things it gives you:
– s[x] gives you the character at position x (well, actually the 1 character substring at x, but you know what I mean).
– s[x:y] gives you a substring.
– s[-x:] gives you the last x characters of the string.
– ‘substr’ in s is a boolean expression which is true if s contains the substr.
– the len() function works for them.
…and so on.
andrew cooke says:

May 20, 2010 at 3:02 am

The range() issue is tricky. I agree the two-arg use is unintuitive, but it’s consistent with the one-arg use. Also, using non-inclusive upper bounds consistently does tend to simplify code, even though it feels odd at first.

Having a warning for the “do nothing” strikes me as a good idea. However, I imagine there may be common cases where good code also gives a warning. One example, off the top of my head, is when you do method chaining (I think this is sometimes called “fluent interface”). I think you’d get the warning then, which would be annoying.

I’ve been bitten by “strings as sequences”, however I do a lot of work with parsers and it does have some advantages in that case. For example, although Lepl (my Python parser library)is normally used with strings, it can also be used with lists of values and binary data. This is possible because they all share the same interface. So again it’s the kind of thing where there are both good and bad points.

As others have said, Unicode is better in 3. There’s no argument that’s it’s a mess in 2.6.

So I guess I’m saying that I understand all your points, but that (apart from Unicode, which thankfully is pretty much fixed in 3) there are reasons that *may* justify them. Unfortunately language design is a compromise and there’s no perfect solution for everyone. The strings / lists issue, for example, certainly makes life more confusing when you’re starting with the language but can be useful in parser libraries…
VLDR says:

May 20, 2010 at 3:17 am

Unicode and character encodings can be a pain in general, you can’t blame python for that. Life would be alot easier if everyone would write 7bit ascii, but that’s not the case (anymore).

Because of backwards compatibility, strings can’t be unicode by default (except for Python 3 which doesn’t have to be backwards compatible). It’s something you have to live with. As far as I can tell it has been fixed in the best possible way right now.
Rob says:

May 20, 2010 at 3:31 am

Half-open intervals (i.e. ranges that include the bottom element but exclude the top) are pervasive in computing largely for the reason that they can be composed, while fully-closed and fully-open intervals cannot. Such an approach to iterators is explicitly taken by the C++ STL and Java, and has a culture and history going back farther than C for loops.

Arguing that you find fully-closed intervals more intuitive is not unreasonable, but it’s on par with complaints that indexing should be one-based instead of zero-based on the basis that “array[1] should be the first element of the array. First = 1.” That view wouldn’t make you wrong, but it would make you look a littly silly.
andrew cooke says:

May 20, 2010 at 3:35 am

Sorry, missed one. Finding things in source. I think that’s a good point, but if you use eclipse with pydev, put the cursor on what you want to read the source for, and press F3, it does *try* to work this out for you. It’s not perfect, but it can save a lot of time. Cheers.
Michael Peters says:

May 20, 2010 at 8:54 am

1) I agree, exclusive ranges are definitely problematic. If you want to iterate a certain number of times, then that’s different from a range. And the fact that the range is only exclusive of the end number (which is the only number you specify in the single arg form) is definitely strange.

2) I agree, documentation is great, but in lots of circumstances it’s not enough. I don’t see why it should be hard to find the source file. But most languages support inheritance or traits/mixins/roles and other methods of class composition, so it can get tricky. Would be nice if languages added a “source_file” method or some such that would let you find where that method is. But I guess that really fits better in languages with a meta-object protocol.

3) I get bit by this in Javascript too. But a warning for a common mistake isn’t too much to ask.

4) This is one of the things I kind of which Perl had. And even if it did it wouldn’t necessarily have the same problems as Python because strings and arrays have different sigils.

5) Yeah, booh to half-baked unicode implementations. At least in Perl though it’s just a warning when you try to use unicode strings without decoding/encoding them. So my program doesn’t die when some unexpected string is utf8, it just produces funny looking characters.
Sam Tregar says:

May 20, 2010 at 9:43 am

@Nathan Reynolds Wow – yeah, I haven’t hit that one yet! Crazy!
novalis says:

May 20, 2010 at 9:58 am

Dijkstra explains why range works the way it does.

For what it’s worth, I frequently iterate through strings — but mostly only on toy problems. The rest, I agree with.
mardiros says:

May 20, 2010 at 11:58 am

Your range problem issue :
irange = lambda x,y,z=1: range(x,y+1,z)
http://zeta-puppis.com/2008/03/06/inclusive-range-in-python/
for myself, i dit not use range with more than one parameter often

But for myselg
Python is readable, Perl is hard to read…

Perl crazy things *,$, %, #, &, \&, @, @_, :: ( and ; on each enf of line… )

while len(foo) > 10:
foo.pop()

if you want to redefine foo.pop in python, is not semanticly complecated,
in perl, most people should open a book or ask google for that…

something like “*foo::pop” or something like unreadable like that isn’t it ?
Perrin Harkins says:

May 20, 2010 at 12:03 pm

@Vil
The trouble with strings being sequence types comes when you’re using a new API where you’re not sure which one you’re going to be passed. You can easily write code that expects an array, gets passed a string, and still runs without any complaints, but of course produces totally incorrect results.
John Farrell says:

May 20, 2010 at 12:13 pm

@anon

+1
Michael Schlenker says:

May 20, 2010 at 3:24 pm

@Y.H.Wong

at your 1: Maybe use the help() function yourself:
>>> range(1,5)
[1, 2, 3, 4]

range(…)
range([start,] stop[, step]) -> list of integers

All the surprising parts he mentioned warrant a ! in the help, which already shows even the designers thought its surprising.

at your 2: The right tip isn’t help() or dir()…, its more like:
import inspect
print inspect.getfile(obj)

at your 3: Well, Tcl has no () at all for function calls (okay, nearly no contexts…)… and the complaint was that Python did not warn about the useless code that does effectively nothing (ah well, someone could have overridden the attribute via property() and could do it for side effects, but if such a design does not warrant a warning not much does…).

at your 4: Well, indexing into a string is useful, but its about on par with pointer arithmetic in C. You can do it, but there are often better and more explicit ways to say what you want to do.

Sometimes python looks like line noise compared to other languages, for example:
Tcl: string reverse “abc”
Python: “abc”[::-1]

at your 5.: you should compare with things like Java, C# or as dynamic languages maybe Tcl. Far nicer and saner Unicode support than the crappy ways of Python 2.x. Python 3.x does a lot better by simply copying the way Java (and Tcl) does it in most parts.
Michael Schlenker says:

May 20, 2010 at 3:43 pm

Even you u””” “”” string example will blow up if your exception contains characters that are not in the ascii range and your not careful:
>>> ex = ‘\xf3’
>>> u””” %s “”” % ex
Traceback (most recent call last):
File “”, line 1, in
UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xf3 in position 0: ordinal not in range(128)

One thing i always found to work is repr(ex) or repr(some_string), in that case you get the values encoded as ascii + escapes. (or you can be explicit specify an error mode for unicode() or some of the codecs functions, but its pretty crappy).

Other annoyances of Python:
x-platform support is pretty weak if you try to code for Windows and Unix, if you do, just be prepared to litter your code with silly if sys.platform == “win32” junk.

Threading: Totally useless.
Networking: Without Twisted its a bad joke, with Twisted its a twisted joke, compared to what other languages offer.
Module system: Well, no versioning. No working sandboxing of code. No real module unloading and a weird way of shutting down the module system.
Class system: Well, a bit low powered, mostly a glorified dict. Makes metaprogramming, DSLs and aspect oriented programming much harder than for example XOTcl.
tef says:

May 20, 2010 at 6:04 pm

1. ranges

this is because lists iterate from 0 to length-1. range captures this common case, although really now people should be using enumerate over range.

I can see how it would be confusing, but it is around the common case of list iteration. So, erm, I’m not with you on this one.

2. methods

This is endemic in inheritance based class models. Traits or roles would be a substantial improvement.

3. nothing

A warning would be nice, but in reality, it’s doing getattr(foo,pop). you could add some warnings easily when you’re doing nothing, but you couldn’t catch them all.

4.string

Worse still, they’re iterable and return lists. It would be like iterating through [1,2,3] and returning [1] [2] and [3] as the items.

They are iterable, but they don’t have real list semantics. This is hard to fix in python because the for loop looks for a foo[x] access and a length and when it can’t find an iterator.

(Although really strings shouldn’t have array style access as they can’t guarantee the same running time with variable width encoding)

5. unicode

Python 3 has unicode by default. Changing the default in python 2 would break things.

The really big annoyances for me are strings are quasi-lists, share-everything base concurrency, and little things like __main__ and for/else.
Ali says:

May 20, 2010 at 9:11 pm

1. The exclusive upper bound on range isn’t so bad, I think, if you consider that we count from zero and the following cases:

range(10) == range(0, 10)
len(range(2, 5)) == 5 – 2

It corresponds neatly to slice notation:

range(10) == range(10)[2:5]

2. I find that the IPython shell is really handy for looking up help: “foo?” will give you the docstring and associated help (like help(‘foo’)) and “foo??” will display the source for “foo”. IPython has many other useful features:

http://ipython.scipy.org/doc/manual/html/interactive/tutorial.html

3. I think iterable strings are nice and allow one to write string algorithms in python, which is handy for all sorts of cases. If you don’t deal with such cases, I can see why it might be annoying — I guess I was already used to thinking of strings as iterable in a sense before I came to Python.

4. foo.pop actually returns a “reference” to the pop method in foo. This is handy if you ever want to pass functions around, e.g. in a callback or building a lookup table of functions to avoid switch statements. It allows a good deal of dynamic programming.

5. Yeah, the unicode stuff is quite annoying. Hopefully this is better in Python 3. I haven’t really looked into it yet.
Ali says:

May 20, 2010 at 9:12 pm

Ugh. The slice example should have been:

range(2, 5) == range(10)[2:5]
Y.H.Wong says:

May 20, 2010 at 11:11 pm

@Michael Schlenker

1. I still don’t agree that range is a surprise. First of all, you almost never have to give a start index, because most of that need is alleviated by mutable sequence slicing. Second of all, the range includes the end – start that many numbers sound pretty intuitive to me. Say when you are tokenizing a string and for some reason you are not using slicing but an inclusive range(), you will then have to add or subtract 1 from the start index starting from the 2nd substring all the way to the Nth. This is really annoying. To “rectify” range(), all the range assumptions in all the methods in all the modules have to change to use an inclusive range, which for the above reason, is a pretty bad idea.

2. Judging from the fact that he’s using Python 2.6.1, I suspect Sam is on a Snow Leopard machine, and on my Snow Leopard machine, when I do a help(), I can see the file path prominently displayed on top of the help page. Do people on other platforms don’t have that? There are many many ways to find the source file of a specific module. Hell, you can even do a find / -type f -name “module.py”. I don’t understand how hard this can be. If this was a valid complaint, I could have complaint the same for Perl and Ruby and PHP and TCL and blah blah too. Alright I’ll bite, some older Macs don’t come with Python’s documentation on the system, in that case you can download the official Python HTML docs and install them in “/System/Library/Frameworks/Python.framework/Resources/English.lproj/Documentation” and export an envvar PYTHONDOCS to point to it.

3. The reason there’s no warning is because functions and methods are objects too. So you can say that sometimes doing foo.pop is also a legitimate property access in the case you want to get the function object back and do some manipulation. So if an operation is most of the time legitimate, and when the sometimes illegitimate use of an operation cannot be distinguished from it’s legitimate counterpart, you can’t really output a warning can you? This complaint is still without merit IMHO because if you have ever played with interpreter with data-structures and their methods, there’s no way this will be surprising to you.

4. As partly explained in 1) Python’s string is a sequence because it supports slicing. Slicing is good because it’s more succinct then say ‘hello’.substring(2,4). Yes it may take a little getting used to, but it’s relatively minor once you understand how it does the arithmetic inside. Another reason Python’s string is iterable is because Python doesn’t have anything to represent bytes and sequence of bytes besides a string. This is of course a conflation of concepts, which luckily was fixed in Py3k.

5. Yes I could compare Python’s Unicode support with Java and C#, but I didn’t because Python will never be as good as the above due to the lack of icu4c on most systems. And since they are not dynamic languages, it’s wouldn’t be a fair comparison because when you choosing a language for your project, if you decide you want to use a language with dynamic typing or type inference, that whole class of C-like languages will have to be tossed out anyway.

I think all this complaint is in my best guess because Sam started off programming in Python on a shaky foundation. Judging from the complaints, he obviously hasn’t spend much time prototyping in the interpreter, which IMHO is absolutely the best REPL interpreter I have ever used and is a clear advantage of Python over other languages. It’s a shame that Sam hasn’t played around in it much. Judging from complaint 4). I’d guess Sam probably was trying to do things the Perl-way in Python. He obviously has no idea what types dict.items() return, how Python’s for… in… works and an iterable string is just the final piece of the puzzle that he has no idea how to solve. My suggestion? Play around in the REPL first. Walk before you try to run.

I agree with you tho that there are things in Python that annoy me. The super() function comes to mind immediately, but none of the things on my list coincide with Sam’s.
Sam Tregar says:

May 20, 2010 at 11:22 pm

@Y.H.Wong
Two things: 1) I spend a ton of time in the REPL trying stuff out. It’s great! Yes, I do things the Perl way sometimes. If you switched to Perl after using Python for 10 years you’d do stuff the Python way sometimes. It’s unavoidable. 2) I full understand how dict.items() works – that’s not the point. For example, consider this data structure:

data = { ‘foo’: [‘1’, ‘2’, ‘3’], ‘bar’: [‘400’, ‘500’, ‘600’] }

If you write a loop that works through the values you’re obviously going to write code like I did. And if you turn out to be wrong and the data structure is really:

data = { ‘foo’: ‘1’, ‘bar’: ‘400’ }

Then you’re going to make the mistake I did, but Python won’t notice. It’s happy to treat a string just like a list. Which is a pain. And that’s all I’m trying to say.

Oh, and please, I’m not using Snow Leopard. This is Linux, thank you very much. I suppose we’ll upgrade to a more recent v2 sometime, but we have a lot of custom RPMs that would need to be recompiled, so it’s not as easy for us as it might be for other people.
Y.H.Wong says:

May 21, 2010 at 12:39 am

@Sam Tregar
Again, It’s the problem with the library I would argue. This confusion results from you don’t really know what’s the type inside the dict’s value set. In your example, I can see why you get confused because GET returns a list of values when there are really multiple values in the parameter, and a string when there’s only 1. I would too, but then if you were the library’s designer, when most of the values are singular except when you submit a list of values from some checkboxes, you’d decide to return a simple string for a singular value too. A function that returns multiple types makes the data flow non-deterministic is often a source of confusion, unfortunately that is the nature of dynamically-typed languages. I’m sure you are aware of this. So your complaint isn’t really specific to Python but a whole class of languages. If you want a more predictable type system without the heavy typing in Java, try something like Scala, F# or Google’s Go.
Sam Tregar says:

May 21, 2010 at 1:14 am

@Y.H.Wong My complaint isn’t specific to Python? That’s crazy talk. This situation couldn’t happen in Perl, which is also a loosely typed language. As soon as you tried to index into a string in Perl you’d get an error, not the first character of the string! It also couldn’t happen in TCL, unless I’m remembering wrong. As far as I know Python is the only loosely typed scripting language I’ve used that treats strings and lists the same!

Your suggestion that I switch languages to avoid this minor irritation is also crazy. Are there no problems in Scala, F# or Go? Of course not, they’d just be different problems! In particular there’s no way Go is mature enough to support the applications I work on. I don’t know much about Scala or F#, but somehow I doubt I’ll be able to convince the group I work with to switch!
Y.H.Wong says:

May 21, 2010 at 2:40 am

@Sam Tregar

I apologize for any strongly worded arguments but I stand by them every word. There’s no need to get testy. I’m sure we can argue our point like adults can’t we?

I can see why coming from a Perl background Python can be surprising, but then you have been programming in it for a year now. It shouldn’t be THAT surprising. As to indexing into string, I’m pretty sure you can do that with Javascript and Ruby, tho in Ruby you get back the ASCII ordinal, which is mind-boggling IMHO. In C#, you can index into anything with indexers. If you can’t understand indexible string, your head is bound to explode every time you try a C-ish language besides Java. It’s really quite common to be able to index into strings, and immensely useful in Python with slicing, so you better get used to it.

Also, indexible strings has nothing to with how a language is typed, I’m just saying you might have confused the problem with a function returning different types under different input with the problem of strings being indexible. In your specific example where you are indexing into a string where a dict maybe returning different types of sequences, it’s really just a rare corner case where the library designer chose to return multiple types that are both sequences. It’s generally frown upon to return different type because of the confusion it may cause as illustrated by your problem. A string is a sequence of characters/bytes, and a list is a sequence of things. Conceptually, in an OOP sense, they are really similar and treating them as the same when you are slicing/indexing into them is quite sensible. I suspect the BDFL would answer you similarly. It’s just because of a bad choice made by someone else in a library that you use, and an unfamiliarity with Python’s types, this confusion has arisen.

I’m not suggesting that you switch to another language, but I do suggest that you try other languages with different paradigms, and try doing things their way. It’s really a mind-opening experience.
Ali says:

May 21, 2010 at 5:25 am

Sam,

Specific to the GET dictionary example you cited, I tihnk Y.H.Wong is right: your library is at fault. Since you’re using GET, you’re extracting values from the query string. The HTTP spec expects name/value pairs in the query string. So, I don’t know why you would expect to get a list. I guess it’s possible someone could send a name twice (/foo?name1=value1&name1=value2), but this is hardly common and I don’t think it corresponds to any HTML form elements, unless they’re repeated (and then the form designer is in error).
Sam Tregar says:

May 21, 2010 at 9:01 am

@Ali You “guess it’s possible” that someone could use the same name twice? It’s not just possible, it’s common! That’s how select elements with multiple set work!
Sam Tregar says:

May 21, 2010 at 9:06 am

@Y.H.Wong Trust me, I’ve tried plenty of other languages, and I’ll keep trying more. I’m not sure how that’s supposed to keep me from stubbing my toe in Python, but I guess it’s possible!
Michael Schlenker says:

May 21, 2010 at 11:10 am

@Y.H.Wong

Complaining Python cannot have good unicode support because there is no ICU on the system is pretty lame, isn’t it? Tcl’s unicode support isn’t perfect either, but if you compare it with Python or Perls its far nicer AND in a dynamical language.

(Which has some other benefits like a sane threading model with no GIL too.)
Ali says:

May 21, 2010 at 2:41 pm

Sam, you’re right about multiple elements, of course. My mistake.

I would only note that the Python dictionary allows for heterogeneous types as values. The burden is always on the programmer to know/test for the right type when using a dictionary. For example, in a more general case, a dictionary may return an object other than a string or list. In your specific example, your expectations didn’t meet the library designer’s choice. I’m not sure that this case by itself is an argument for having non-iterable strings.
Y.H.Wong says:

May 22, 2010 at 4:17 am

@Michael Schlenker

Yes you are right, I should have reversed the order of my arguments on Unicode in comment #31. Python 2.x and 3.x are trying its best to deal with Unicode, but out of jealousy, I must mention that Python’s documented (too lazy to google for it, but pretty sure it’s buried somewhere in the i18n-SIG mailing list) unlikelihood of requiring ICU as a compile dependency means there’s very little chance Python will come with good collation support by default now and in the future. You just can’t beat Java’s Unicode support. But then again, it’s a moot point because no sane person will reject Python just because it doesn’t come with a module to do collation by default.

From my limited understanding of Tcl, it doesn’t seem to have the historical burdens that Python and Perl have, and due to the fact that everything in Tcl can be treated as a string, its designer probably took extra care to make sure string support is good. With that said, I wouldn’t know under what circumstance one would choose Tcl over anything.

Due to similar historical reasons, both Perl and Python 2.x’s have come up with similar ways to handle Unicode. But, I would still choose Python over Perl for the following reasons(correct me if there are better ways to do them, I never know what’s the best way to do anything in Perl):

1. I could see if a string is a unicode string by checking if isinstance(u’abc’, unicode), and u’abc’ doesn’t have to be UTF-8 encoded.
2. I don’t have to require Encode or use byte when I want to encode an Unicode string. I just do u’blah’.encode(“utf8”).

These are small ideosyntactic differences, but nonetheless I prefer.
Aaron Swartz says:

July 14, 2010 at 10:57 am

Wow, great blog — can’t believe I missed it.

I’m curious what you think of Python 3’s solution to the Unicode problem. Personally, I think the world would be better off if we just used UTF-8, but whenever I suggest that the Japanese get very mad at me.

Airtrout