A Slony-I example step by step

1. Build and install

Slony must be built and installed by the postgres unix user. The installation target will be identical to the existing PostgreSQL installation and the original PostgreSQL source tree must be available. On certain platforms (AIX is one of these), PostgreSQL must be configured with the option --enable-thread-safety to provide correct client libraries. The location of the PostgreSQL sourcetree is specified with the configure option --with-pgsourcetree=<dir>.

The complete list of files installed is:

The .sql files are not fully substituted yet. And it is correct that both, the 7.3 and the 7.4 files get installed on a system regardless of its version. The sloni admin utility will do the substitution and load these files when creating replication nodes. At that time, the database being instialized might be remote and of a different version.

At least the two shared objects installed in the $libdir directory must be installed onto every computer that is supposed to become a Slony node.

2. Creating a sample database with application

As a first test scenario, the pgbench test program that is shipped with PostgreSQL in the ./contrib directory and produces a not too light load of concurrent transactions will satisfy our needs.
Please note that the pl/PGsql procedural language must be installed into this database

Local will be remote

The Slony replication system is trigger based. One of the nice side effects of this is, that one can even replicate between two databases under the same postmaster. But things can get a little confusing when we're talking about local vs. remote database in that context. To avoid confusion, from here on we will strictly use the term "node" to mean one database and its replication daemon program slon.

To make this example work for people with one or two computer systems at their disposal, we will define a few shell variables that will be used throughout the following scripts.

CLUSTER=test1
DBNAME1=pgbench_node1
DBNAME2=pgbench_node2
HOST1=<host name of pgbench_node1>
HOST2=<host name of pgbench_node2>
SLONY_USER=<postgres superuser to connect as for replication>
PGBENCH_USER=<normal user to run the pgbench application>

It is assumed that the the unix user executing all the commands in this example has the ability to establish all database connections. This is not intended to become a "pg_hba.conf HOWTO", so replacing whatever sophisticated authentication method is used with "trust" until replication works would not be a bad idea.

Preparing the two databases

As of this writing Slony-I does not attempt to automatically copy the table definitions when a node subscribes to a data set. Because of this, we have to create one full and one schema-only pgbench database.

createdb -O $PGBENCH_USER -h $HOST1 $DBNAME1
createdb -O $PGBENCH_USER -h $HOST2 $DBNAME2
pgbench -i -s 1 -U $PGBENCH_USER -h $HOST1 $DBNAME1
pg_dump -s -U postgres -h $HOST1 $DBNAME1 | psql -U postgres -h $HOST2 $DBNAME2

From this moment on, the pgbench test program can be started to produce transaction load. It is recommended to run the pgbench application in the foreground of a separate terminal. This gives the flexibility to stop and restart it with different parameters at any time. The command to run it would look like this:

pgbench [-n] -s 1 -c <n_clients> -U $PGBENCH_USER -h $HOST1 -t <n_trans> $DBNAME1

3. Configuring the databases for replication

Creating the configuration tables, stored procedures, triggers and setting up the configuration is done with the slonik command. It is a specialized scripting aid that mostly calls stored procedures in the node databases. The script to create the initial configuration for a simple master-slave setup of our pgbench databases looks like this:

Script slony_sample1_setup.sh

#!/bin/sh

CLUSTER=test1
DBNAME1=pgbench_node1
DBNAME2=pgbench_node2
HOST1=<host name of pgbench_node1>
HOST2=<host name of pgbench_node2>
SLONY_USER=<postgres superuser to connect as for replication>
PGBENCH_USER=<normal user to run the pgbench application>

slonik <<_EOF_
	# ----
	# This defines which namespace the replication system uses
	# ----
	cluster name = $CLUSTER;

	# ----
	# Admin conninfo's are used by the slonik program to connect
	# to the node databases. So these are the PQconnectdb arguments
	# that connect from the administrators workstation (where
	# slonik is executed).
	# ----
	node 1 admin conninfo = 'dbname=$DBNAME1 host=$HOST1 user=$SLONY_USER';
	node 2 admin conninfo = 'dbname=$DBNAME2 host=$HOST2 user=$SLONY_USER';

	# ----
	# Initialize the first node. The id must be 1.
	# This creates the schema "_test1" containing all replication
	# system specific database objects.
	# ----
	init cluster ( id = 1, comment = 'Node 1' );

	# ----
	# The pgbench table history does not have a primary key or
	# any other unique constraint that could be used to identify
	# a row. The following command adds a bigint column named
	# "_Slony-I_test1_rowID" to the table. It will have a default
	# value of nextval('"_test1".sl_rowid_seq'), be unique and not
	# null. All existing rows will be initialized with a number.
	# ----
	table add key ( node id = 1, full qualified name = 'public.history' );

	# ----
	# The Slony replication system organizes tables in sets. The
	# smallest unit another node can subscribe is a set. Usually the
	# tables contained in one set would be all tables that have
	# relationships to each other. The following commands create
	# one set containing all 4 pgbench tables. The "master" or origin
	# of the set is node 1.
	# ----
	create set ( id = 1, origin = 1, comment = 'All pgbench tables' );
	set add table ( set id = 1, origin = 1,
		id = 1, full qualified name = 'public.accounts',
		comment = 'Table accounts' );
	set add table ( set id = 1, origin = 1,
		id = 2, full qualified name = 'public.branches',
		comment = 'Table branches' );
	set add table ( set id = 1, origin = 1,
		id = 3, full qualified name = 'public.tellers',
		comment = 'Table tellers' );
	set add table ( set id = 1, origin = 1,
		id = 4, full qualified name = 'public.history',
		key = serial,
		comment = 'Table history' );

	# ----
	# Create the second node, tell the two nodes how to connect to 
	# each other and that they should listen for events on each
	# other. Note that these conninfo arguments are used by the
	# slon daemon on node 1 to connect to the database of node 2
	# and vice versa. So if the replication system is supposed to
	# use a separate backbone network between the database servers,
	# this is the place to tell it.
	# ----
	store node ( id = 2, comment = 'Node 2' );
	store path ( server = 1, client = 2,
		conninfo = 'dbname=$DBNAME1 host=$HOST1 user=$SLONY_USER');
	store path ( server = 2, client = 1,
		conninfo = 'dbname=$DBNAME2 host=$HOST2 user=$SLONY_USER');
	store listen ( origin = 1, provider = 1, receiver = 2 );
	store listen ( origin = 2, provider = 2, receiver = 1 );
_EOF_

4. Time to replicate

Is the pgbench application still running?

At this point we have 2 databases that are fully prepared. One is the master database accessed by the pgbench application. It is time now to start the replication daemons.

On the system $HOST1, the command to start the replication daemon is

slon $CLUSTER "dbname=$DBNAME1 user=$SLONY_USER"

Since the replication daemon for node 1 is running on the same host as the database for node 1, there is no need to connect via TCP/IP socket for it.

Likewise we start the replication daemon for node 2 on $HOST2 with

slon $CLUSTER "dbname=$DBNAME2 user=$SLONY_USER"

Even if the two daemons now will start right away and show a lot of message exchanging, they are not replicating any data yet. What is going on is that they synchronize their information about the cluster configuration.

To start replicating the 4 pgbench tables from node 1 to node 2 we have to execute the following script:

Script slony_sample1_subscribe.sh

#!/bin/sh

CLUSTER=test1
DBNAME1=pgbench_node1
DBNAME2=pgbench_node2
HOST1=<host name of pgbench_node1>
HOST2=<host name of pgbench_node2>
SLONY_USER=<postgres superuser to connect as for replication>
PGBENCH_USER=<normal user to run the pgbench application>

slonik <<_EOF_
	# ----
	# This defines which namespace the replication system uses
	# ----
	cluster name = $CLUSTER;

	# ----
	# Admin conninfo's are used by the slonik program to connect
	# to the node databases. So these are the PQconnectdb arguments
	# that connect from the administrators workstation (where
	# slonik is executed).
	# ----
	node 1 admin conninfo = 'dbname=$DBNAME1 host=$HOST1 user=$SLONY_USER';
	node 2 admin conninfo = 'dbname=$DBNAME2 host=$HOST2 user=$SLONY_USER';

	# ----
	# Node 2 subscribes set 1
	# ----
	subscribe set ( id = 1, provider = 1, receiver = 2, forward = no );
_EOF_

Shortly after this script is executed, the replication daemon on $HOST2 will start to copy the current content of all 4 replicated tables. While doing so, of course, the pgbench application will continue to modify the database. When the copy process is finished, the replication daemon on $HOST2 will start to catch up by applying the accumulated replication log. It will do this in little steps, 10 seconds worth of application work at a time. Depending on the performance of the two systems involved, the sizing of the two databases, the actual transaction load and how well the two databases are tuned and maintained, this catchup process can be a matter of minutes, hours, or infinity.

5. Checking the result

To check the result of the replication attempt (actually, the intention was to create an exact copy of the first node, no?) the pgbench application must be stopped and any eventual replication backlog processed by node 2. After that, we create data exports (with ordering) of the 2 databases and compare them:

Script slony_sample1_compare.sh

#!/bin/sh

CLUSTER=test1
DBNAME1=pgbench_node1
DBNAME2=pgbench_node2
HOST1=<host name of pgbench_node1>
HOST2=<host name of pgbench_node2>
SLONY_USER=<postgres superuser to connect as for replication>
PGBENCH_USER=<normal user to run the pgbench application>

echo -n "**** comparing sample1 ... "
psql -U $PGBENCH_USER -h $HOST1 $DBNAME1 >dump.tmp.1.$$ <<_EOF_
	select 'accounts:'::text, aid, bid, abalance, filler
		from accounts order by aid;
	select 'branches:'::text, bid, bbalance, filler
		from branches order by bid;
	select 'tellers:'::text, tid, bid, tbalance, filler
		from tellers order by tid;
	select 'history:'::text, tid, bid, aid, delta, mtime, filler,
		"_Slony-I_${CLUSTER}_rowID"
		from history order by "_Slony-I_${CLUSTER}_rowID";
_EOF_
psql -U $PGBENCH_USER -h $HOST2 $DBNAME2 >dump.tmp.2.$$ <<_EOF_
	select 'accounts:'::text, aid, bid, abalance, filler
		from accounts order by aid;
	select 'branches:'::text, bid, bbalance, filler
		from branches order by bid;
	select 'tellers:'::text, tid, bid, tbalance, filler
		from tellers order by tid;
	select 'history:'::text, tid, bid, aid, delta, mtime, filler,
		"_Slony-I_${CLUSTER}_rowID"
		from history order by "_Slony-I_${CLUSTER}_rowID";
_EOF_

if diff dump.tmp.1.$$ dump.tmp.2.$$ >test_1.diff ; then
	echo "success - databases are equal."
	rm dump.tmp.?.$$
	rm test_1.diff
else
	echo "FAILED - see test_1.diff for database differences"
fi

If this script reports any differences, I would be glad to hear how this happened. Please contact me as JanWieck@Yahoo.com or on the IRC channel #slony on freenode.net.

Jan Wieck