DataWarehousing

From pgwiki

Jump to: navigation, search
Image:Exclamation-sm.png WARNING: This page has been migrated to the PostgreSQL Wiki. Please do not edit this page or your changes may be lost!

Contents

Data Warehousing Features

  • On-disk Bitmap Index (anyone game to finish GP patch?)
  • Error Handling in COPY
  • Parallel Query
  • Windowing Functions
  • MERGE
  • Parallel Index Build

Note that there's much overlap between here and the Simon Riggs' Development Projects planning.

On-Disk Bitmap Index

Error Handling in COPY

Handled OK by pg_loader, so lower priority

Parallel Query

2 main kinds are single-node and multi-node parallelism Fairly easy to get something working to improve SeqScan performance, but will be more difficult to get something working that applies further up into the executor. Challenges are

  • Planner changes
  • How to manage the pool of query slaves
  • Deciding which parts of the data are accessed by which slave

Notably the last two aren't an issue at all in most multi-node parallel architectures, since there is one query slave per node, plus each node operates only on the data that has been statically partitioned to it, often using a hash partitioning scheme.

Windowing Functions

SQL:2003 feature

MERGE

SQL:2003 feature

Parallel Index Build

Josh Berkus: not sure how this works exactly, but it speeds Oracle up considerably

Personal tools