Todo:Collate
From pgwiki
WARNING: This page has been migrated to the PostgreSQL Wiki. Please do not edit this page or your changes may be lost!
Contents |
SQL standard COLLATE support
A collation is a set of rules about how a set of objects should be ordered. In general it only applies to strings because that's the only case that has any kind of debate. This is also the only case the SQL standard worries about. However, in the general case you could probably have a meaningful discussion about how geometric shapes could be ordered.
References
- Charset/collate support and function parameters [1]
- Proof of concept COLLATE support with patch [2]
- For review: Initial support for COLLATE [3]
- Removing SORTFUNC_LT/REVLT [4]
Related todo items
Status
- Design submitted [5] - not reviewed
- Partial Patch - abandoned
Overview of changes required
- The collation is a property of a column, not of a value. At any point in a query only a single collation is current. So fields need to be added to the executor to track this.
- Functions need to know what collation their arguments are, so a field is to be added to the fmgr structure.
- Changes to grammer to allow:
- CREATE TABLE .. ( ..., field COLLATE <collation>, ... )
- ALTER TABLE
- CREATE INDEX .. ON ( field COLLATE <collation> )
- CREATE INDEX .. ON ( (expr COLLATE <collation>) )
- expr :: expr COLLATE <collation>
- Possibly: CREATE COLLATE ..
Discussion
PostgreSQL currently only supports one collation at a time, as fixed by the LC_COLLATE variable at the time the database cluster is initialised. Note that this is intrinsically broken because you can choose the encoding per database but usually collation implementations only work on a particular encoding. It's also a major problem because virtually all databases have some strings that should be considered as plain ascii strings (such as path names, urls, or even codes like 'Y' and 'N') even if they have localizable strings as well.
POSIX collation support basically consists of two functions: strxfrm() and strcoll(). The former takes a string and converts it to another string that can be passed to strcmp() to do the actual comparison. The following statement is true:
strcoll(a,b) == strcmp( strxfrm(a), strxfrm(b) )
POSIX collation only works on the current locale. This is somewhat restrictive, so most major operating systems have provided variation that work with multiple locales. This includes Win32, Linux/glibc and Mac OS X. OSes which don't include the BSDs, HPUX and Solaris.
For this reason you can actually hack multi-locale sorting in some operating systems using a pg_strxfrm() function and creating a functional index on that. [6]
Possible Problems
Fill me in...

