some time ago back at the plone performance sprint in copenhagen i was trying to help optimizing zope start-up time a little more after hannosch's impressive pre-sprint effort.
while profiling i found a recursive implementation of a "list flatten" function, which was called way too often. i figured this could be done better and wanted to see what approaches other people had used before. a blog post titled more on python flatten sounded promising, but the proposed best method somehow seemed overly complicated and longish to me. next thing of course the perfectionist in me crept up, and well, we were at a sprint after all, so why not have some fun? :)
while the first version worked for my use-case and decreased plone start-up time by about 0.2 seconds (which i thought was not too bad for a small change like that), it was a bit oversimplified in that it didn't consider some egde-case. that wasn't too hard to fix, though, and eventually i posted what i considered good enough as a comment in that blog post:
def flatten(l, ltypes=(list, tuple)): n = 0 for i in iter(l): if isinstance(i, ltypes): l[n:n+1] = [] n += 1 l[n:n] = list(i) return ljudging by the fact nobody ever replied to it, i guess its workings might have been not too obvious after all. but today, after someone else posted another version i checked back on the post finding that "this one whoops all of the others for speed, nesting level support and elegance!":
def flatten(l, ltypes=(list, tuple)): ltype = type(l) l = list(l) i = 0 while i < len(l): while isinstance(l[i], ltypes): if not l[i]: l.pop(i) i -= 1 break else: l[i:i + 1] = l[i] i += 1 return ltype(l)it was referring back to Mike C. Fletcher's BasicTypes library, hence dubbed "mr. fletcher's version" and even compared to ruby's built-in flatten method (implemented in C), almost living up to it.
i got curious and wanted to see how my version was doing in comparison, and surprisingly it it about 33% faster. and that's after i fixed it to be non-destructive with regard to the passed in list:
$ python flatter.py fletchers_flatten: 9.197 sec my_flatten: 5.934 sec comparison: 64.53 %and without wanting to brag too much, i still think my version is quite a bit slicker, too... ;)
anyway, here's the whole test script in case you want to run it yourself:
# mr. fletcher's version def fletchers_flatten(l, ltypes=(list, tuple)): ltype = type(l) l = list(l) i = 0 while i < len(l): while isinstance(l[i], ltypes): if not l[i]: l.pop(i) i -= 1 break else: l[i:i + 1] = l[i] i += 1 return ltype(l) # my approach... def my_flatten(l, ltypes=(list, tuple)): ltype = type(l) l = list(l) n = 0 for i in iter(l): if isinstance(i, ltypes): l[n:n+1] = [] n += 1 l[n:n] = list(i) return ltype(l) # set up a nested list a = [] for i in xrange(2000): a = [a, i] # make sure it works as advertised backup = a assert my_flatten([[]]) == [] assert my_flatten(a) == list(reversed(xrange(2000))) assert a == backup # benchmark from time import time now = time() for n in xrange(5000): a = fletchers_flatten(a) fletcher = time() - now now = time() for n in xrange(5000): a = my_flatten(a) my = time() - now print 'fletchers_flatten: %.3f sec' % fletcher print 'my_flatten: %.3f sec' % my print 'comparison: %.2f %%' % (my / fletcher * 100)
update:
MJ was right, of course. the fixed version reads like this...
def my_flatten(l, ltypes=(list, tuple)): ltype = type(l) l = list(l) n = 0 for i in iter(l): if isinstance(i, ltypes): l[n:n+1] = [] l[n+1:n+1] = list(i) n += 1 return ltype(l)