Web And Database Synchronization? 11
crazney asks: "I am developing a medium sized Web site with a friend, and we have stumbled across a small problem. Our hosting will consist of several Web servers load balanced, and possible two or three database servers. The extra database servers may be used for load balancing or for redundancy. My question is, what can we use to keep these up to date? We would like to ensure that all the Web servers always contain an exact same copy of the Web page, and any changes to the database affect the others. These servers may be geographically separated, so something low bandwidth would be helpful. For the Web servers, a quick and dirty way could be CVS, but I'm stumped for the database servers (which will probably be running PostgreSQL)."
More Questions... (Score:1)
If this number is small or non existent then would it be possible to keep a log of transactions (ie. store the SQL query) and run them against the other DB servers??
erServer! (Score:3)
If you'd rather do it the 'old fashioned' way, what is the query load like? If the data is mostly static, you could send all the writes to 1 database and use all the DBs for reads. Every 2 or so hours, update the read-only databases. Something like rsync would be very good for that. Until erServer is released, this may be your best choice.
There's been some discussion about this on the PostgreSQL mailing lists - check the pgsql-general archives @ postgresql.org
MySQL has added replication ... (Score:1)
Not sure if this has anything to do with what you're doing as MySQL is an entirely different product than PostgreSQL, and you may be trying to avoid it based on its own well documented pitfalls/features/etc.
The URL for the story on setting up MySQL replication is: http://www.phpbuilder.com/co lum ns/tanoviceanu20000912.php3 [phpbuilder.com].
Hope it helps.
do NOT use rsync (Score:3)
If you want cheap replication and a consistent db, your best bet (for now) is to use pg_dumpall to make periodic backups of the db, transfer them (perhaps using scp and host-based authentication), and then use psql to restore them. To see how the dump/restore process works, look at the man page [usask.ca] for pg_dumpall. You can likely do this all from a cronjob.
If you do it with stock pg_dumpall, you will probably need to take the site down for a few minutes or else risk odd service interuptions. However, "pg_dumpall -d" will dump tuples as SQL insert statements; this will allow you to put the data in a running db. Be aware that the dump/restore process can be quite taxing (especially on a production db where you likely don't want to turn off fsync() calls for the bulk copies). I don't know the requirements of your application, but you'll likely want to strike some sort of compromise between speed and up-to-date correctness.
HTH.
~wog
Re:do NOT use rsync (Score:2)
You're right, of course. rsyncing the actual database files while the DB serving is running is incredibly dumb.
Re:More Questions... (Score:1)
The server database will probably be written to several times per minute, which are crytical.. Henc that idea probably doesnt work..
Thanks, crazney.
"Who is General Failure and why is he reading my hard disk ?"
Replication via Update Logs (Score:1)
One of MySQL's solutions to replication (before it added replication features) was an option to keep update logs which contained all INSERT commands it ran on a DB. To update a slave database you would simply feed the mysql client the updatelog.
A practical solution could be cycling the update log every hour, then having the slave databases fetch the hour update, etc...
Again, I have no idea if PostreSQL supports this, but just for reference here's the MySQL manual links:
21.1 Database replication with update log [mysql.com]
21.3 The update log [mysql.com]
Re:erServer! (Score:1)
Re:erServer! (Score:1)
I'm happy we're getting replication at all. The companies that really need it won't mind paying for it, and the money goes into further PgSQL development. And (AFAIK) it's all going to be open source eventually.
Re:do NOT use rsync (Score:1)
For example, on oracle, you can do this by putting the databases in backup mode, which temporarily freezes the contents of the files, and writes all updates to redo logs. Using rsync on the db files while in backup mode would be fine. Of course, I've found that when dealing with huge (multple gigabytes) amounts of data, rsync doesn't help much.
Re:Replication via Update Logs (Score:1)
What this guy should be doing is purchasing a real database engine (ie Oracle, Informix, DB2, etc. While this is very expensive, so is having data corrupted or destroyed using some el cheapo solution.