Converting database representations

I would like to start with an apology for the quietness on this blog recently, work has been going on though a little slower than I would like as my personal life has been getting in the way of doing any work. Over the last six weeks things have improved and there are two exciting new fields in the pipeline.

New fields on the way

There are 90% finished patches for UUIDField and DurationField. These fields both have long standing accepted tickets to be introduced as general django fields, not specific to PostgreSQL. On PostgreSQL, they will natively use the uuid and interval data types, and on other databases the will use char(32) and bigint respectively. The initial patches used the metaclass SubfieldBase provided by Django, but this has some significant issues. As a result, changes have been needed in the ORM to handle database-python conversions when loading from the database.

The status quo

Database backends

Not all database drivers return all field types in the same format. Psycopg2 is generally pretty well behaved, but mysql, oracle and sqlite all have some idiosyncrasies about certain fields, usually relating to dates, decimals or booleans. Furthermore, sometimes these only happened when aggregates were involved. There were two hooks provided to deal with these, firstly Query.convert_values calling DatabaseOperations.convert_values, which was called by aggregates code, and secondly the optional method SQLCompiler.resolve_columns which was called during normal queryset evaluation if present. On oracle and in gis, SQLCompiler.resolve_columns invoked Query.convert_values which in turn invoked DatabaseOperations.convert_values.

This change based on whether SQLCompiler.resolve_columns existed or not caused at least ticket 21565.

Custom fields

Django provides a metaclass called SubfieldBase, which basically means that that field will have its to_python() method called whenever a value is assigned to the model, including when it is loaded from the database. This really is an abuse of to_python() whose primary purpose is to convert strings from serialisation and forms into the relevant python object. It also provided no way to change the behaviour based on the backend, but crucially was not called by aggregation or values() calls. (ticket 14462)

Proposed changes

The new proposed code allows backends and fields to provide converter functions to transform the value from the database to the model layer. DatabaseBackend converters are run first and for internal types will normalise the values. A custom field can then convert the resulting object again if needs be. This code is run in an efficient manner and the same way in all parts of the ORM - queries, aggregates, .values(), .dates() etc. Due to changes in the signatures and for performance reasons, all the hooks have changed. The new API is summarized below:

"Private" API

  • SQLCompiler.get_converters(fields)
  • SQLCompiler.apply_converters(row, converters)

"Semi-Private" API

  • DatabaseOperations.get_db_converters(internal_type) - returns a list of backend converter functions convert(value, field)
  • Field.get_db_converters(connection) - returns a list of field converter functions convert(value, field)

Public API

  • Field.from_db_value(value, connection) - public documented hook to replace SubfieldBase.

A note on gis

This has a 147 line negative diff in contrib.gis, removing a bunch of duplicated code, an entire compiler module for mysql, GeoValuesQuerySet, and custom code in SQLDateCompiler. Overall it is a pretty substantial cleanup and the proposed API is nicely overridable by the gis compiler.

Comments

The proposed changes are available as a pull request for comment, and there is also a thread on Django-Dev

blogroll

social