Database Replication? 7
xcaster asks: "I've been working with Lotus Notes for several years. Although Lotus Notes is horrible software, it wins in one feature: replication. I've researching all major RDBMS for a while, and found that even though they offer replication support, none provides easy-to-use, flexible support for disconnected-user-oriented replication. Do real solutions for replication for disconnected users exist out there?" It would be interesting to see how well the major free RDBMS engines stack up when it comes to replication. Which RDBMS software can support replication out of the box, and for the ones that do not, what changes would they need to make to support it?
Re:Oh no it didn't... (Score:1)
The story around one of my previous places of employment was that some later version of SQL Server had received a fresh infusion of code. These days it's arguable that SQL Server is a better RDBMS than Sybase, at least on Windows. Then again, it does tend to break itself somewhat frequently.
Re:Replication (Score:2)
Oh no it didn't... (Score:2)
In fact, 4.21 was effectivly a "re-badged" product rather than anything else.
Money & Technical judgement (Score:2)
Why Notes can replicate. (Score:2)
Replication is complicated to get right. Notes manages because it is not relational and does not try to be. Because Notes is based on documents, all the information relating to the document is stored in one place, and if a conflict occurs in replication it is relatively simple to deal with it (either by merging the two conflicting documents, or by creating a new documenty and flagging it visibly as a conflict).
I have no great experience with relational dbs, but the little I know suggests that this would be a whole lot more difficult using Oracle, MySQL, etc. The equivalent of a document would be stores in n different tables, leading to a possible n different replication conflicts (where n varies depending on the application and database design.
I'd be interested to know why xcaster thinks Notes is 'horrible software'. I've been using/developing Notes and Domino applications for years now, and I think it is pretty nice. You can develop apps quickly and efficiently, you can replicate, apps will work via a Notes client, or over the web (as long as they are written appropriately). Problems tend to occur when you use Notes to do things which it is not appropriate for (e.g. anything that doesn't fit in the document oriented paradigm).
MS SQL Server (Score:2)
MS SQL Server (which grew out of Sybase 10, and is now an impressive RDBMS in its own right) supports three different flavors of replication. It's not free, but it's worthy of mention.
I could go on about it for long periods of time, but it would be better said in other ways:
The basics though are that SQL server can replicate in Transactional, Merge, or Snapshot modes; Snapshot has the least processor overhead but the most bandwidth use, merge is someplace in betweem and transactional is at the other end of the spectrum. They can all be push or push/pull, and transactional and merge can also be immediate-updating (where any number of servers all get the changes, and HAVE to get all the changes.) There's far more to it than that, but if you want to know more, check out the above links.
Replication: don't do it (and why) (Score:4)
Basically, there are two ways to preform replication: "lazy replication" and "eager replication". "Eager replication" means that all updates are atomic across all nodes and that transactions are serializable. However, the problem with "eager replication" is that as you increase the number of nodes n, the probability of deadlock increases on the order of n^5. The "solution", such as it is, is to remove the expectation of serializabilty, using timestamps for concurrency control, only allow commutative transactions on your data, and use two-tier replication [google.com]. This works for banks and others whose database applications consist mainly of commutative transactions, but won't for many others: YMMV. (Gray's paper also details the differences between having a single "master" node that "owns" all db objects and having each node own several objects.)
IIRC, the way Notes does it is by queuing updates at the local node and using an optimistic concurrency control mechanism when the local node connects to replicate. This is great for the application domain that Notes caters to: I "own" my own calendar, and if I'm out of the office (and have my node -- notebook -- with me), you can't schedule me for an appointment until I come back. However, for many application domains, this won't work.
In any case, that's why Notes does it -- because it can, thanks to the nature of its data domain -- and why most people don't -- because it's hard/impossible for the general case.
~wog
PS -- If you can't get into the ACM Digital Library, check out these lecture notes [berkeley.edu] from Stonebraker's anthology at Berzerkeley [berkeley.edu].