Postgres replication for MySQL DBA's
Working a with Postgers and MySQL replication I noticed that many MySQL DBA's have hard time grasping Postgres replication and how it works, you may ask why would you need to know how to setup and use Postgres replication - but lets face it Postgres is becoming ever more popular and many MySQL shops also have Postgres databases. Overall a lot of startups are using a collection of databases that serve various purposes. I"ll attempt to explain Postgres replication in plain English and use MySQL terms to help adsorb the seemingly complex subject.
Postgres "streaming replication" was introduced since Rel. 9.0 of Postgres and is a very useful addition that many of us were waiting for years, personally I strongly believe that the ack of replication kept Postgres from wide adoption unlike MySQL that had the replication from early releases. Replication is a great tool since it allows to scale and shard your databases horizontally i.e. distributed computing.
Postgres documentation on replication is complex and not very intuitive in addition the terminology is confusing adding to the difficulty of creating replicated slaves in Postgres. In reality Postgres replication is actually very simple and flexible, once you get over the terminology.
Types of Postgres replication
The trouble with Postgres replication is that it has just too many of them, Slony, Bucardo, Londiste and of course now added "streaming replication". The first step is to choose one that fits the particular use case. In this posting I will focus on Postgres native build-in "streaming replication".
Trigger Based replicationSlony
Log Based ReplicationStreaming Replication - this is the replication we are going to focus on, this replication is build-in Postgers release since 9.0 and is closely resembling MySQL replication that was log based from the get go. But that is where the similarities stop.
Postgres "streaming replication"The first confusion comes from the term "streaming replication" , all replications are essentially "streaming" i.e. the master is "streaming" sets of record to the slave, so adding "streaming" to the replication term did not help the easy of adoption. To be fair Postgres named its replication "streaming" to distance itself from the early attempts at replication that is before rel. 9.0. There are references in Postgres documentation to "hot" and "warm" standby, the "hot" standby being the "streaming" one. All and all "streaming replication" means that Postgres slave database can now read directly from the stream of data from the master database, as compared to "warm" standby when the logs had to be transferred to the slave database that was running in the continuos recovery mode applying those logs.
Anyhow "streaming replication" is the one that you want to use, with no further delay will proceed on to how to set it up.
What do you need to setup Postgres replication1. Master database
2. Slave database
3. Archiving of the logs - this is the same as enabling binlogs on MySQL
4. Replication user - this is the same as replication user on MySQL
5. Open database ports 5432 between the master and the slave vs. 3306 on MySQL
6. Copy the Master database backup to the slave
7. Start the replication and enjoy the benefits
Master DatabaseIn order for the master Postgres database to stream the data to the slave it must be configured just like MySQL - the bulk of the process involves enabling the archiving i.e. "binlogs" and permission on the master database.
Setup replication user and permissions:Postgres security unlike MySQL is using combination of database permissions and configuration files to enable the connectivity to the databases. The main file to setup the replication from user perspective is /var/lib/pgsql/9.1/main/pg_hba.conf
Add replication user to pg_hba.conf file:host replication postgres 172.17.28.57/32 trust
What it is in the line above we are adding the replication user to the file and instruct the security to
allow the user to connect from 172.17.28.57/32 - that is the replication slave.
Add replication user to the Postgres database:
CREATE USER replicator WITH SUPERUSER ENCRYPTED PASSWORD
In summary - to create replication user you need to perform two steps 1. Add the use to pg_hba.conf
and add the user to the database using database grants. This is a bit unlike MySQL where all you have
to do is to grant "replication slave" privileges to the slave. All and all one extra step in Postgres.
Now since the user is added we can proceed to the rest of the configuration, we are working on the
master database so far, will get to the slave very soon.
The next step we need to enable archiving i.e. enable binlogging in MySQL terms.
Enable Master database archving
There are several parameters in Postgers main configuration file postgresql.conf that control the
replication and archiving:
Add the directory for the archive logs:
And the last thing on the master to enable the replication .shh files
Remember the archive_command above that copies the archive files to the slave? We need to configure postgres user no to use the password when connecting to the slave to copy the files over. Using public/private key pairs makes it whole a lot easier. Be sure to setup public/private key pairs.
Restart the master database for the changes to take effect.
After the primary master restart you should see the archive files in /var/lib/pgsql/9.1/main/wals directory on the slave.
[root@smyhost# ls 000000010000086700000061.00000080.backup 000000010000087700000005 000000010000087700000008 00000001000008770000000B 000000010000086700000064.000013F0.backup 000000010000087700000006 000000010000087700000009 00000001000008770000000C 000000010000086E0000009E.00000020.backup 000000010000087700000007 00000001000008770000000ANow since we have the master database working its time to configure the slave database. Slave database setup is very simple - all we have to do now is to copy the backup of primary database to the slave database and it will catch up using the archive files and then start reading the wal segment from the master. The archive logs serve as sort of relay logs in MySQL, if you stop the slave database it will fall behind but when you start it again it will use the archive logs to catch up.
Creating backup of primary database and copying to the slave
This is where Postgres database shines - you do not have to take it offline to copy the data, it has a great new feature pg_start_backup - it will place master database in backup mode so you can copy it to the slave database keeping the data consistent. On the master database perform the following:
"SELECT pg_start_backup('backup', true)"
Copy the database to the standby slave using rsync or scp:
-av --exclude postmaster.pid --exclude pg_xlog
End backup mode on primary database
Setup the the slave database parameters:You will need to edit two files: postgresql.conf and recivery.conf
Edit postgresql.conf and add the following lines:
Create the recovery.conf and add the following lines:
standby_mode = 'on'
primary_conninfo = 'host=172.17.28.56 port=5432 user=postgres'
Connect string to the primary master database
trigger_file = '/var/lib/pgsql/9.1/main/pgsql.trigger'
restore_command = 'cp -f /var/lib/pgsql/9.1/main/wals/%f %p
This command is reading the archive logs files
archive_cleanup_command = 'pg_archivecleanup /var/lib/pgsql/9.1/main/wals/ %r'
This one cleans up old archives
Now restart the the slave database for the changes to take affect and you have yourself a running replication.
Checking the replication:ps -ef | grep receive
You should see the following on the slave:
postgres 12050 11809 1 10:33 ? 00:02:30 postgres: wal receiver process streaming 877/2A9F7300
That means you database is reading the wal segments from the master. But if you dont see it right away - this isnormal since your database need to catchup with the master reading the archive logs first and then once it catches up you will see the receiver process.
Easy huh :-)