Showing posts with label sqlobject. Show all posts
Showing posts with label sqlobject. Show all posts

Sunday, January 4, 2009

Moving to TG2, Part 1

This is the start of a series on the work we're doing upgrading BandRadar's components. BandRadar's development started in 2006, and used the default components of TurboGears at the time: SQLObject for ORM, kid for templating, tgwidgets for widgeting, mochikit for js library, and CherryPy as appserver.

Basically all of these components are now obsolete, or at least no longer recommended for new TG/TG2 projects. But they still work fine, so why change?

Good question. I'm hoping that re-examining the choices we made in implementing BandRadar version 1 will bear dividends in terms of simplifying the code or making it easier to add new functionality, but honestly it's not clear at this point whether time spent on this work will have more value than spending it on extending the current code. It also depends on how hard moving to the new components is. Thankfully, TurboGears will let us transition to new components in stages, instead of all at once.

The plan:
  1. SQLObject to SQLAlchemy (Elixir)
  2. kid to Genshi
  3. tg.widgets to ToscaWidgets
  4. CherryPy to Pylons/mod_wsgi
Currently working on #1. Converting classes derived from class SQLObject to class Entity and defining the relationships has been relatively painless. All our table names are not what Elixir is expecting, but it's easy to tell it what they should be. The first little bit of trouble is that SQLObject did not create db constraints for required fields, it keeps track of these itself and throws an error if a required field is omitted when creating an object. SA appears to rely on the database for this (which makes sense) but our existing db schema (originally created by SO) will need to be updated by hand for the new not-NULL constraints. We've also added a fair number of additional methods to the objects on model.py, and these will need to be reimplemented in SA-ese. Finally, our SO model includes a feature where all modifications to some tables are logged to a separate table (so we could recover from vandalism if necessary)... this will need to be reimplemented in SA, or we need to jump to versioned objects. I'd prefer the former, frankly, but I don't know if SA has a feature equivalent to sqlobject.events, or if I could just override __setattr__() in my model objects.

Friday, April 20, 2007

A persistent dict class

One big part of the website is importing events, with artist names associated with it. However, sometimes a band is referred to in slightly a different way -- not enough to be significant to a person, but enough to fool a computer into thinking it's a new band. The way I initially fixed this was a artist_name_fixup dict in the importing module. The dict contained a mapping between the incorrect name and the correct one. So for example, "Static X" mapped to "Static-X" (the latter is the actual band name, according to their website.)

But this meant that every time a new fixup was needed, the source file containing the dict needed to be changed. OK, so put it in the database. Well, we check this dict a LOT, so it would be nice to not hit the DB every time.

The persistent dict solves this by saving added entries in the database, but also keeping the dict in-memory. First, we define the fixup DB table in our model:
class ArtistNameFixup(SQLObject):
name = UnicodeCol(alternateID=True)
value = UnicodeCol()
Then, the new class (in our util.py module):
class PersistentDict(dict):
def __init__(self, model):
super(PersistentDict, self).__init__()
self.model = model
for row in model.select():
super(PersistentDict, self).__setitem__(row.name, row.value)

def __setitem__(self, name, value):
try:
r = self.model.byName(name)
r.value = value
except SQLObjectNotFound:
r = self.model(name=name, value=value)
super(PersistentDict, self).__setitem__(name, value)

def __delitem__(self, name):
self.model.byName(name).destroySelf()
super(PersistentDict, self).__delitem__(name)

def update(self, other_dict):
for name, value in other_dict.iteritems():
try:
r = self.model.byName(name)
r.value = value
except SQLObjectNotFound:
r = self.model(name=name, value=value)
super(PersistentDict, self).update(other_dict)
Finally in the import module, we add:
import util
artist_fixup_dict = util.PersistentDict(ArtistNameFixup)
When we start the TG application, the artist_fixup_dict will be populated from the database. Subsequent additions and deletions will be reflected both in the in-memory version, as well as in the db table, so if the app is reloaded, it won't lose anything.

Finally, note that the PersistentDict code is making assumptions about the SO model it is initialized with -- that the table has "name" and "value" fields. With a little more code, these could be made configurable, but I don't need that flexibility, so I haven't added it.

Sunday, April 15, 2007

Logging all table changes to another table

Since the site is user-editable, it needs some way to log all changes to certain tables, so they can be undone, if hit by vandals adding bad data. What I came up with this: model classes that need this (not all do) inherit from Journalled (as well as SQLObject, of course). Journalled overrides the __setattr__ and set() methods, and creates entries in the UpdateLog table for each change.
class UpdateLog(SQLObject):
created = DateTimeCol(default=datetime.now)
changed_by = IntCol()
table_name = UnicodeCol(length=12)
table_id = IntCol()
attrib_name = UnicodeCol(length=20)
attrib_old_value = UnicodeCol()
attrib_new_value = UnicodeCol()

def _get_attrib_old_value(self):
return pickle.loads(str(self._SO_get_attrib_old_value()))
def _set_attrib_old_value(self, value):
self._SO_set_attrib_old_value(pickle.dumps(value))
def _get_attrib_new_value(self):
return pickle.loads(str(self._SO_get_attrib_new_value()))
def _set_attrib_new_value(self, value):
self._SO_set_attrib_new_value(pickle.dumps(value))
The old and new values are stored as actual Python objects. IDs are stored as just Ints, no relationships, since entries here can come from more than one table.
# Classes inheriting from this will have changes stored in the UpdateLog table
class Journalled(object):

def __setattr__(self, name, value):
if name in self.sqlmeta.columns.keys():
self._record_update({name:value})
super(Journalled, self).__setattr__(name, value)

def set(self, **kw):
self._record_update(kw)
super(Journalled, self).set(**kw)

def _record_update(self, updates):
for name, value in updates.iteritems():
old_value = getattr(self, name, None)
if old_value != value:
try:
current_user = identity.current.user.id
except:
current_user = None
u = UpdateLog(
changed_by=current_user,
table_name=self.sqlmeta.table,
table_id=self.id,
attrib_name=name,
attrib_old_value=old_value,
attrib_new_value=value
)
Honestly, with this in place, I haven't actually written the code to roll back any vandalism. The data is all in the UpdateLog, so I can put it off until the website actually gets hit ;-)

SQLObject svn trunk also appears to have recently added table versioning support. Sounds good, but I haven't tried it yet.

Sunday, April 8, 2007

SelectResults are great

One change I made as the website has evolved is to switch from using SQLObject RelatedJoin and MultipleJoin to their cousins, SQLRelatedJoin and SQLMultipleJoin. After having used them, I hazard to say that I think they should be the default. The non-SQL versions return lists of objects; the SQL versions return a SelectResult.

A SelectResult (which of course is also returned from Class.select()) doesn't actually contain the results of the select, but it knows how to get it. It waits to hit the database until actually dereferenced. If you have a SelectResult instead of an actual list of objects, it's not too late to further filter or modify the result!

I had several places in my code where I was iterating through a RelatedJoin, performing a filter in Python. Switching these to SQLRelatedJoins allowed me to ask the database to perform the filter instead, thus reducing the amount of code, as well as helping performance.

The one thing to watch out for: to determine if the SelectResults actually has any rows, you need to call class.results.count() instead of len(class.results), since class.results isn't a list anymore.

Sunday, April 1, 2007

undocumented new TG features

BandRadar is implemented in TurboGears, a Python-based web framework. The interesting thing about TG is that it is like the Borg of web frameworks -- it incorporates several different projects for different areas of functionality, and acts to tie them all together nicely.

While there is the larger debate about which components to use, now that TG offers choices for templating and ORM, there are also smaller issues, because new versions of the old components are still coming out!

For example, when I first started implementing BandRadar in TurboGears 0.8, SQLObject would not clean up entries in an intermediate table used for many-to-many relationships. Therefore, I had to write destroySelf() methods for all classes with these that cleaned this up manually.

This isn't necessary anymore, SQLObject 0.8 deletes them for you. It also allows you to define cascade behavior for ForeignKeys. This is very nice, and I am glad to be rid of that code. But why did I have to find out about this new feature by happenstance? The SQLObject developers really need to have a changelog and release notes evident on their web page. These new features are documented, but I don't think many people make it a habit to skim docs for new tidbits -- this is why release notes are important.

However, ranting aside, there is now one less reason why I should take the time to switch to SQLAlchemy. ;-)