Monday, April 23, 2007

Site makeover

Just went live with the new site makeover! W00t! No more harsh green, we went for a more mellow ochre (?) palette, and a blocky, vintage feel inspired from old concert posters.

We still have some kinks to work out when it comes to IE -- please bear with us, or even better, use Firefox.

In other news, I was out at a Karmedy gig at the Red Room on Thursday, the place lost power! Never had that happen at a gig before. Totally missed Karmedy (sup Josh!) and only caught half of The Unnamed's set (who were frickin amazing, btw)

Friday, April 20, 2007

A persistent dict class

One big part of the website is importing events, with artist names associated with it. However, sometimes a band is referred to in slightly a different way -- not enough to be significant to a person, but enough to fool a computer into thinking it's a new band. The way I initially fixed this was a artist_name_fixup dict in the importing module. The dict contained a mapping between the incorrect name and the correct one. So for example, "Static X" mapped to "Static-X" (the latter is the actual band name, according to their website.)

But this meant that every time a new fixup was needed, the source file containing the dict needed to be changed. OK, so put it in the database. Well, we check this dict a LOT, so it would be nice to not hit the DB every time.

The persistent dict solves this by saving added entries in the database, but also keeping the dict in-memory. First, we define the fixup DB table in our model:
class ArtistNameFixup(SQLObject):
name = UnicodeCol(alternateID=True)
value = UnicodeCol()
Then, the new class (in our util.py module):
class PersistentDict(dict):
def __init__(self, model):
super(PersistentDict, self).__init__()
self.model = model
for row in model.select():
super(PersistentDict, self).__setitem__(row.name, row.value)

def __setitem__(self, name, value):
try:
r = self.model.byName(name)
r.value = value
except SQLObjectNotFound:
r = self.model(name=name, value=value)
super(PersistentDict, self).__setitem__(name, value)

def __delitem__(self, name):
self.model.byName(name).destroySelf()
super(PersistentDict, self).__delitem__(name)

def update(self, other_dict):
for name, value in other_dict.iteritems():
try:
r = self.model.byName(name)
r.value = value
except SQLObjectNotFound:
r = self.model(name=name, value=value)
super(PersistentDict, self).update(other_dict)
Finally in the import module, we add:
import util
artist_fixup_dict = util.PersistentDict(ArtistNameFixup)
When we start the TG application, the artist_fixup_dict will be populated from the database. Subsequent additions and deletions will be reflected both in the in-memory version, as well as in the db table, so if the app is reloaded, it won't lose anything.

Finally, note that the PersistentDict code is making assumptions about the SO model it is initialized with -- that the table has "name" and "value" fields. With a little more code, these could be made configurable, but I don't need that flexibility, so I haven't added it.

Sunday, April 15, 2007

Logging all table changes to another table

Since the site is user-editable, it needs some way to log all changes to certain tables, so they can be undone, if hit by vandals adding bad data. What I came up with this: model classes that need this (not all do) inherit from Journalled (as well as SQLObject, of course). Journalled overrides the __setattr__ and set() methods, and creates entries in the UpdateLog table for each change.
class UpdateLog(SQLObject):
created = DateTimeCol(default=datetime.now)
changed_by = IntCol()
table_name = UnicodeCol(length=12)
table_id = IntCol()
attrib_name = UnicodeCol(length=20)
attrib_old_value = UnicodeCol()
attrib_new_value = UnicodeCol()

def _get_attrib_old_value(self):
return pickle.loads(str(self._SO_get_attrib_old_value()))
def _set_attrib_old_value(self, value):
self._SO_set_attrib_old_value(pickle.dumps(value))
def _get_attrib_new_value(self):
return pickle.loads(str(self._SO_get_attrib_new_value()))
def _set_attrib_new_value(self, value):
self._SO_set_attrib_new_value(pickle.dumps(value))
The old and new values are stored as actual Python objects. IDs are stored as just Ints, no relationships, since entries here can come from more than one table.
# Classes inheriting from this will have changes stored in the UpdateLog table
class Journalled(object):

def __setattr__(self, name, value):
if name in self.sqlmeta.columns.keys():
self._record_update({name:value})
super(Journalled, self).__setattr__(name, value)

def set(self, **kw):
self._record_update(kw)
super(Journalled, self).set(**kw)

def _record_update(self, updates):
for name, value in updates.iteritems():
old_value = getattr(self, name, None)
if old_value != value:
try:
current_user = identity.current.user.id
except:
current_user = None
u = UpdateLog(
changed_by=current_user,
table_name=self.sqlmeta.table,
table_id=self.id,
attrib_name=name,
attrib_old_value=old_value,
attrib_new_value=value
)
Honestly, with this in place, I haven't actually written the code to roll back any vandalism. The data is all in the UpdateLog, so I can put it off until the website actually gets hit ;-)

SQLObject svn trunk also appears to have recently added table versioning support. Sounds good, but I haven't tried it yet.

Monday, April 9, 2007

Our data's getting better, but still not perfect

We have worked hard over the past few months to substantially improve the amount and correctness of event data available through BandRadar. This is the kind of thing that doesn't really show up as a new feature, but is very important to the site being valuable to users.

Although we are pulling data from multiple sources automatically, a human is still involved in reviewing each item for correctness. Lately, the further-out events have been more complete on the site. However, if you see incorrect or missing entries (maybe for your band's next gig?) YOU can add them, or correct them! If there are other problems with the site, please leave a comment and we'll take a look right away.

We have some new user-visible features coming soon but I'll blog about them after they're actually up.

Sunday, April 8, 2007

SelectResults are great

One change I made as the website has evolved is to switch from using SQLObject RelatedJoin and MultipleJoin to their cousins, SQLRelatedJoin and SQLMultipleJoin. After having used them, I hazard to say that I think they should be the default. The non-SQL versions return lists of objects; the SQL versions return a SelectResult.

A SelectResult (which of course is also returned from Class.select()) doesn't actually contain the results of the select, but it knows how to get it. It waits to hit the database until actually dereferenced. If you have a SelectResult instead of an actual list of objects, it's not too late to further filter or modify the result!

I had several places in my code where I was iterating through a RelatedJoin, performing a filter in Python. Switching these to SQLRelatedJoins allowed me to ask the database to perform the filter instead, thus reducing the amount of code, as well as helping performance.

The one thing to watch out for: to determine if the SelectResults actually has any rows, you need to call class.results.count() instead of len(class.results), since class.results isn't a list anymore.

Sunday, April 1, 2007

how to use the TurboGears scheduler

This is a quick overview of how to use the TG scheduler to, for example, do nightly db cleanups, or similar.

First, put this in your controllers.py:

from turbogears import scheduler
import batch

def startup():
scheduler.add_weekday_task(batch.task, range(1,8), (3,0))

turbogears.startup.call_on_startup.append(startup)

You can see I've put the code I'm going to run in another file, called batch.py, in a function called task(). You can also take a look at scheduler.py in the TG code for more details on the scheduler API, but above I'm basically asking to be called every day at 3 am.

Then, in batch.py:

def task():
hub.threadingLocal = threading_local()
hub.begin()
# do stuff, like clean up your identity database. You can use your ORM objects here.
hub.commit()
hub.end()

Hope this helps!

undocumented new TG features

BandRadar is implemented in TurboGears, a Python-based web framework. The interesting thing about TG is that it is like the Borg of web frameworks -- it incorporates several different projects for different areas of functionality, and acts to tie them all together nicely.

While there is the larger debate about which components to use, now that TG offers choices for templating and ORM, there are also smaller issues, because new versions of the old components are still coming out!

For example, when I first started implementing BandRadar in TurboGears 0.8, SQLObject would not clean up entries in an intermediate table used for many-to-many relationships. Therefore, I had to write destroySelf() methods for all classes with these that cleaned this up manually.

This isn't necessary anymore, SQLObject 0.8 deletes them for you. It also allows you to define cascade behavior for ForeignKeys. This is very nice, and I am glad to be rid of that code. But why did I have to find out about this new feature by happenstance? The SQLObject developers really need to have a changelog and release notes evident on their web page. These new features are documented, but I don't think many people make it a habit to skim docs for new tidbits -- this is why release notes are important.

However, ranting aside, there is now one less reason why I should take the time to switch to SQLAlchemy. ;-)