Monday, January 26, 2009

flattened...

some time ago back at the plone performance sprint in copenhagen i was trying to help optimizing zope start-up time a little more after hannosch's impressive pre-sprint effort.

while profiling i found a recursive implementation of a "list flatten" function, which was called way too often. i figured this could be done better and wanted to see what approaches other people had used before. a blog post titled more on python flatten sounded promising, but the proposed best method somehow seemed overly complicated and longish to me. next thing of course the perfectionist in me crept up, and well, we were at a sprint after all, so why not have some fun? :)

while the first version worked for my use-case and decreased plone start-up time by about 0.2 seconds (which i thought was not too bad for a small change like that), it was a bit oversimplified in that it didn't consider some egde-case. that wasn't too hard to fix, though, and eventually i posted what i considered good enough as a comment in that blog post:

def flatten(l, ltypes=(list, tuple)):
    n = 0
    for i in iter(l):
        if isinstance(i, ltypes):
            l[n:n+1] = []
            n += 1
            l[n:n] = list(i)
    return l
judging by the fact nobody ever replied to it, i guess its workings might have been not too obvious after all. but today, after someone else posted another version i checked back on the post finding that "this one whoops all of the others for speed, nesting level support and elegance!":
def flatten(l, ltypes=(list, tuple)):
    ltype = type(l)
    l = list(l)
    i = 0
    while i < len(l):
        while isinstance(l[i], ltypes):
            if not l[i]:
                l.pop(i)
                i -= 1
                break
            else:
                l[i:i + 1] = l[i]
        i += 1
    return ltype(l)
it was referring back to Mike C. Fletcher's BasicTypes library, hence dubbed "mr. fletcher's version" and even compared to ruby's built-in flatten method (implemented in C), almost living up to it.

i got curious and wanted to see how my version was doing in comparison, and surprisingly it it about 33% faster. and that's after i fixed it to be non-destructive with regard to the passed in list:

$ python flatter.py
fletchers_flatten: 9.197 sec
my_flatten: 5.934 sec
comparison: 64.53 %
and without wanting to brag too much, i still think my version is quite a bit slicker, too... ;)

anyway, here's the whole test script in case you want to run it yourself:

# mr. fletcher's version
def fletchers_flatten(l, ltypes=(list, tuple)):
    ltype = type(l)
    l = list(l)
    i = 0
    while i < len(l):
        while isinstance(l[i], ltypes):
            if not l[i]:
                l.pop(i)
                i -= 1
                break
            else:
                l[i:i + 1] = l[i]
        i += 1
    return ltype(l)

# my approach...
def my_flatten(l, ltypes=(list, tuple)):
    ltype = type(l)
    l = list(l)
    n = 0
    for i in iter(l):
        if isinstance(i, ltypes):
            l[n:n+1] = []
            n += 1
            l[n:n] = list(i)
    return ltype(l)

# set up a nested list
a = []
for i in xrange(2000):
    a = [a, i]

# make sure it works as advertised
backup = a
assert my_flatten([[]]) == []
assert my_flatten(a) == list(reversed(xrange(2000)))
assert a == backup

# benchmark
from time import time
now = time()
for n in xrange(5000):
    a = fletchers_flatten(a)
fletcher = time() - now

now = time()
for n in xrange(5000):
    a = my_flatten(a)
my = time() - now

print 'fletchers_flatten: %.3f sec' % fletcher
print 'my_flatten: %.3f sec' % my
print 'comparison: %.2f %%' % (my / fletcher * 100)

update:
MJ was right, of course. the fixed version reads like this...

def my_flatten(l, ltypes=(list, tuple)):
    ltype = type(l)
    l = list(l)
    n = 0
    for i in iter(l):
        if isinstance(i, ltypes):
            l[n:n+1] = []
            l[n+1:n+1] = list(i)
        n += 1
    return ltype(l)

Thursday, January 22, 2009

please downgrade my buildout!

i won't go into discussing particular use cases here — that would perhaps be too weird :) — but let's just say sometimes it can be handy to be able to quickly "downgrade" an existing plone 3.x buildout to plone 2.5. here's a buildout config that does exactly this:
[buildout]
extends = buildout.cfg

[plone]
recipe = plone.recipe.distros
urls = http://launchpad.net/plone/2.5/2.5.5/+download/Plone-2.5.5.tar.gz
nested-packages = Plone-2.5.5.tar.gz
version-suffix-packages = Plone-2.5.5.tar.gz
eggs =

[zope2]
url = http://www.zope.org/Products/Zope/2.9.8/Zope-2.9.8-final.tgz
fake-zope-eggs = true

[instance]
products =
 ${buildout:directory}/products
 ${productdistros:location}
 ${plone:location}

Friday, January 9, 2009

what a git!

so here's the real reason i started it — nothing too exciting, but after a hunch of a discussion it simply had to go somewhere... :)
since i just figured the below out for the second time (after forgetting what the suitable git commands were since the first time a few months ago) i thought i might as well write it down. perhaps it'll come in handy at some point... :)

imagine you're offline, in a train or plane or cafe, and feel like getting some work done. you're starting to make changes and quickly realize you'd rather split things into several commits than just a single big one. however, you're working against a subversion repository and have — since you're offline — no way to commit things back right away or (conveniently) keep the patches separate. well, except perhaps for copying the whole directory for every single one and later merge/commit them one by one. not really worth the trouble... but git to the rescue!

even without a previously cloned svn repos it's quite easy to locally commit away and later replay your changes. here's a quick step-by-step howto:

  • initialize an empty git repos, propbably best done in the top dir of your working tree:
  • $ git init
  • add the current state to it (you might have to be more explicit here):
  • $ git add *
    $ git commit -m 'import'
  • do your work, i.e. change things, locally commit to git:
  • $ vi ... # or emacs if you must ;)
    $ git commit -a
being offline you leave it at that. once you're back online you wanna transfer those changes to svn, or, iow, apply them to a cloned git repository:
  • format/export the patches you've made (the rev specifier ":/import.." translates to "between a commit starting with 'import' and head) to separate files:
    $ git format-patch :/import..
    
    this will output numbered files, one file per patch, like:
    0001-fix-1.patch
    0002-fix-2.patch
    ...
  • remove your temporary git repos (or move it out of the way):
  • $ rm -rf .git
  • if you haven't done so already, set up the cloned repos (extra options might be required here), which allows you to commit back to svn:
  • $ git svn clone svn://svn-url
  • re-apply the patches:
  • $ git am 00*.patch
  • and finally commit back to svn:
  • $ git svn dcommit

so now you start after all?

this is the first one & i'll make it short:  i was contemplating getting a blog a couple of times now.  it was usually about the possibility to share a few potentially useful bits of information — mostly about plone or related web development — vs. "wasting" precious time on writing stuff nobody reads while i could do more interesting things instead.  like for example, play with my son... :)
anyway, hoping that some of the things posted here will actually turn out to be useful for others, i'll turn in.  well, i already have by clicking that "create" button i suppose... :)