Distributed Versus Centralized DB?

Distributed Versus Centralized DB? 47

Posted by Cliff on Saturday July 16, 2005 @03:30PM from the updates-without-terrible-bottlenecks dept.

OSXCPA asks: "I am developing a framework for processing accounting information. What kinds of advice would you have for setting up an environment where several (eventually, many) remote data aggregation and processing centers would update a central server 'after hours' to avoid processing delays. For those with ERP experience, I am trying to process transactions locally, instead of generating a batch in the central server, then update and validate the result to the central server, so my remote users can process at full speed all day, and central DB can get caught up later. Remote locations are all self-contained, in that no one will be posting transactions to another remote location. No big iron - this is all networked PC/Mac hardware and a central server (eventually to be a server farm, as needed). Feedback or constructive criticism would be appreciated. Technical advice is great, but I'm also looking for stuff like 'Gee, if I had only known XXX when I set this up, I would have done YYY...'"

Distributed Versus Centralized DB?

This discussion has been archived. No new comments can be posted.

Search 47 Comments Log In/Create an Account

Comments Filter:

Run away while you still have your sanity. ;) (Score:4, Insightful)

by Shag ( 3737 ) writes: on Saturday July 16, 2005 @03:44PM (#13082802) Journal

I am not a professional DBA, but I tend to wind up working with them. The one thing I remember from a big project where replication and whatnot came into play was that none of that stuff ever worked as well as the vendors said it would, and it all required a lot more work on the DBA's part (and typically lots more $$$ in general) than it was supposed to.

Comment removed (Score:3, Insightful)

by account_deleted ( 4530225 ) writes: on Saturday July 16, 2005 @04:05PM (#13082927)

Comment removed based on user account deletion

data integrity above all else (Score:2, Insightful)

by Anonymous Coward writes: on Saturday July 16, 2005 @06:01PM (#13083580)

I can't stress this enough: the purpose of a database is to guarantee data integrity. Be sure you understand that: storing/retreving/manipulating data is part of it too, but without data integrity, you're lost.

Unfortunately the state of the database industry (both vendors and users) is so fucked up right now I'm not even sure people know what "data integrity" means. I'll give you one definition: it means that every possible combination of values in your database has a legal business meaning. Does it makes sense to have order items without corresponding orders? No? Then make sure it's not possible.

Another definition: don't ever allow the two facts "X" and "not X" to ever be derived from the same database! With my order item example above: if order #123 doesn't exist, but there are order items referencing #123, then you can prove both "order #123 exists" and "order #123 doesn't exist" from the same database. BAD!

As for replicated databases, one approach is usually to partition the data. For instance, use a compound key (REGION#, ID#), where REGION# is hardcoded on each database. I.e. you have a constraint that says REGION# must always be "100" on database #1, "101" on database #2, etc.

You can combine all the data regularly into one BIG databases, or you can just go out and query each individual database depending on what region it's in. I'm maintaining an in-house app currently that keeps customer data for each individual city in a cluster physically located in that city. Then an outside app polls the databases and combines the data for presentation. Works pretty well, but my queries are pretty simple.

Small scale solution (Score:2, Insightful)

by j()nty ( 741565 ) writes: on Saturday July 16, 2005 @07:26PM (#13083946)
I'm in the same situation, writing an application with about 500 users spread over 30 offices, a local database in each office, a central database, and overnight updates from the offices to the central database.
I'm using globally-unique-id's for all records. These are generated by the database system I'm using and are guaranteed to be different across all the offices. No conflict between new records created at different offices. And all I have to do is use an existing feature from the database system.

I don't delete records. Just mark dead records using a timestamp. This way records don't have to be deleted from the central database and other offices; instead just propagate the dead timestamp.

I've decided to write my own update program that pushes changes from one office to the central database. I can use this same program in reverse to update the offices from the central database. And I can use it to update laptop users who only connect once a week. And, and, and...

It's a chore to write the update program. But I see some advantages:
- The update program can be as simple as possible, but no simpler.
- I decide when, where, and how the update program runs.
- If there are conflicts between updates from different offices I can code the procedure for resolving the conflicts. The rules for resolving conflicts are decided by the users, they get exactly what they want, and the pay for me to code it.
Some more things to keep in mind (Score:1, Insightful)

by Anonymous Coward writes: on Saturday July 16, 2005 @07:52PM (#13084057)

Since you mentioned this application is being rolled out to a bunch of PC's and Mac's, it sounds to me like you might be deploying to a large installation base where you do NOT have centralized management of the machines...

At some point in the future, you probably will be upgrading the system to support new features / new data.

You'll want to design the intefaces between the remote sites and "the mothership" so that as migrations are partially completed, the system continues to run for the older and the newer versions.

Get management to agree to shell out the big bucks (Score:1, Insightful)

by jgardn ( 539054 ) writes: <jgardn@alumni.washington.edu> on Sunday July 17, 2005 @02:59AM (#13085463) Homepage Journal

You are entering an area of database technology where angels fear to tread. This is something that if it were easy, we wouldn't be paying top DBA's hundreds of thousands of dollars to do it and keep it working with acceptable uptime. There is a reason why companies that run the scale of Amazon have IT expenses in the millions and millions, and why they do their darnedest to hire and keep the best talent in the country. You're entering the big leagues now. You're going to need a professional team to handle the gameplay, because your minor-league team can't cut it.

Your first step is to look your boss square in the eye and say, "I think I am smart. But I know there are a lot of smarter guys than me out there, and they have a hard time implementing this right, and they have a budget in the millions. Sure, we can hack together a solution that may solve your short-term needs, but I know what is coming down the road and that solution will not be it." You can elaborate what you know as far as your manager is willing to try and understand it. But you need to wake your manager up and explain to him that this is a different game now, run with different rules.

Your next step is to convince the boss to give you time and resources to learn the black arts of this field. The resources will include a *team* of DBAs who have experience with this kind of thing (and they aren't cheap!), software developers who have seen this before (again, not cheap!), machines to run on of the proper caliber (and you may decide you will need a mainframe - don't write it off too soon.) The time will be whatever it takes to (a) understand /in depth/ the field as it exists today, and (b) understand the problem you are being asked to solve /in depth/, and anticipate future needs, and (c) mapping a solution that addresses the facts of life in (a) and (b). This means paperwork, specifications, and a lot of meetings and sales calls and demonstrations and typing.

If your manager can't agree to this, then you need to tell him this next part: "I cannot implement what you need unless you give me the tools and resources to do it. You are asking me to build a skyscraper with spare parts I find in the kitchen. You may be able to find someone who is willing to tell you that they can do it, and do it cheaper and faster. But I submit to you that they are selling you snake oil. When you see his solution, when it comes under load, and when you start to learn about things you've never heard of, which they'll call "minor technical details that can be easily fixed", but deep in your gut you'll get the sense that they aren't minor and that they are certainly not "details", you will remember what I told you about skyscrapers and kitchen tools. You will begin to grasp the enormity of the problem you have asked me to solve. You will begin to see that throwing the money after the snake oil salesman was a worse investment than throwing a lot of money at the real solution."

If I were you, I'd start filling out the resume and start looking for a new job at a smaller company that doesn't have these problems. That is, unless you absolutely believe you are going to be up to the task and management is going to take it as seriously as you will. I've seen too many people try to deliver to management what they could not because they wouldn't pay attention to those annoying "facts" and this thing called "logic". Don't fall into this group. Be up-front, bold, and clear as heck with your manager, and don't try to hide anything from them.

Distributed (Score:3, Insightful)

by JoeCommodore ( 567479 ) writes: <larry@portcommodore.com> on Sunday July 17, 2005 @04:52PM (#13088330) Homepage

I've not been trained in building distributed systems but have learned on my own from the school of hard knocks.

As metioned a good Record ID strategy is a must; I used a packed timestamp (base 60, using upper/lowr case numbers) with a node/user identifier, besides keeping the records unique you can also use the ID to track down where/when problems mikght have occurred.

Also time stamps and delete logs!!! You need to know what data is new or old or should have been deleted.

Work on a way to automate updates, you are going to be coding not for one computer but for all your nodes on the network, set something out there that can keep your data/apps in check and updated. A big stumbling block is getting the data transfer structure and protocol worked out.

Look at the importance and frequency of data and make sure your system can handle it, if nightly is OK, then it's fine, though there might be times when your satte;lites might be down, don't paint yourself into a corner by expecing every transfer to work right every time.

Choose your apps wisely, since you mentioned Macs that narrows your choice of what you can use. We had used FoxBase +/Mac, but 4th dimension is cross-platform and looks real flexible. It could be done using LAMP at the nodes, but you will have to work out your own distributed data import/export scripts. (DBF xfer was sooo eay)

I find you have to implement a lot more remote validation, as you expect your data to be clean when it comes in from the other offfices.

Also things happen to different sites, and they may have to restore from backups etc. Work out how to re-sync the sites when such events occur.

You are looking at a very complex system to manage. (those thousand or so lines of spaghetti code will keep you employed)

Normalize and clean up your database as much as possible now include planned data element (even if you currently hide it for now), making structure changes when distributed can be a real pain.

With that said, after years of being distributed; technology here in our rural community now makes it possible to centralize, and I'm relieved that I can make the switch. (though I may keep bits distibuted, field laptops, and such.) It's a good skill to learn, adds a whole new level of paranoia when you're coding. :-)

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Distributed Versus Centralized DB? 47

Distributed Versus Centralized DB? More Login

Distributed Versus Centralized DB?

Run away while you still have your sanity. ;) (Score:4, Insightful)

Comment removed (Score:3, Insightful)

data integrity above all else (Score:2, Insightful)

Small scale solution (Score:2, Insightful)

Some more things to keep in mind (Score:1, Insightful)

Get management to agree to shell out the big bucks (Score:1, Insightful)

Distributed (Score:3, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot