Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Programming IT Technology

How Do You Sync Database Schemas? 31

Rob Sweet asks: "I recently got started coding for a PHP front end to RRDTool. Right now, there are only two developers but we get the impression that once a protocol is in place, we'll have several more. The question has been posed: We can use CVS to keep our code synchronized but how do we go about keeping our database schemas synchronized? The obvious answers involve using mysqldump to keep updated table creation scripts in CVS but I'm wondering if there isn't a better way..." At the very least, a file containing a list of schema changes would be necessary, but what about programs that can take two schemas, look at the differences, and return the commands necessary to make the one mirror the other?
This discussion has been archived. No new comments can be posted.

How Do You Sync Database Schemas?

Comments Filter:
  • Syncing schemas (Score:4, Informative)

    by mangino ( 1588 ) on Thursday June 06, 2002 @02:54PM (#3653877) Homepage
    In the past, I have created scripts that populate data. You can do several levels of refresh, refresh just the data, or delete all objects and re-create them as well. This is a bit of a pain to set up, but it works well in simple cases.

    Using ant, I just had a task to take a snapshot of the data in the database and save it to cvs. I then had a data refresh as a general part of the setup.

    Every once in a while, we would rebuild all objects by dropping all tables and recreating them. This was nice in development, but a pain in production (reloading a 4million row table takes a while, not to mention keeping the data in CVS)

    For production usage, we created alter table scripts that got added with the correct TAG. When we installed a build, all alter scripts were run before any code was pushed.
  • Check out this product [red-gate.com] from Red-Gate Software.
  • by photon317 ( 208409 ) on Thursday June 06, 2002 @03:43PM (#3654266)

    I've been working in private for a while on my own ERD-like software (similar in flavor to Erwin and the likes), and I've dealt with this problem to some degree. It's much easier to have a higher-level tool deal with the issue. In my case, abstract schemas are stored in XML, and I have a tool that parses the XML and generates all the sql for "create table blah blah blah" for whatever db vendor you pick. Then there's another tool that can diff between two revisions of the XML schema definition and issue "alter table blah blah" statements to update a database's table layout.

    Mine won't be ready for public consumption for some time yet, since I only work on it now again in spare time, and it has big huge unrealistic design goals - but it's not a hard job to build a simple version of the above on your own that's tailored to just your needs.
  • The scripts you use to create and populate the database should go into CVS and then you can run them to create or recreate your database and data.

    I have also used CVS to store a file that is a series of database alters or a series of files that are a bunch of db alters that are to applied together to form a kind of DB release or tag.

    DDL and load scripts are just code, treat them as code.
  • by bwt ( 68845 ) on Thursday June 06, 2002 @03:49PM (#3654313)
    Some database shops use CASE tools for data model generation and reverse engineering. Ultimately, these sorts of tools represent a data model with an object model, allow direct editing of the internal representation, can import by examining the data dictionary of a datbase, and can generate SQL DDL as needed to apply the difference or create from scratch.

    In proprietary realm, Oracle Designer is pretty good at this sort of thing. You can get a developer licence for free from technet.oracle.com, but it's big $ for production use.

    There are some open source tools for this, but they all seem to be are fairly young. I happened to notice one on Freshmeat today called Alzabo [alzabo.org].
  • We perform Nightly backups of the dynamic tables of the database. We can use these to rereate on our staging machine the state of live. We have a separate table (updates) that stores the current update level. We have a script called updateXXXX.pl which contains all the sql code necessary to update the DB. the XXXX is replaced with the update number (update0016.pl is our current). When the script runs, it checks the update table to see what the last script run was. If you are attempting to run update 15, and 14 hasn't been run, it will stop you.

    This is part of our QA process. Before a build goes to staging, we wipe out the staging database and replace it with the copy from live. Ten we run the update and push the build code. Once a staging build is blessed, we can push it live.
  • Alzabo (Score:3, Interesting)

    by m_ilya ( 311437 ) <ilya@martynov.org> on Thursday June 06, 2002 @04:03PM (#3654441) Homepage
    If you use MySQL or PostgreSQL you can use Alzabo [sourceforge.net] to synchronize database schemas.
  • Describe the table structure using an XML format of your own devising. Write a script that will create the SQL commands based on the XML format. For instance, a section of the XML could look like this:
    <table name="Address">
    <field name="First" type="char" length="20" />
    <field name="Last" type="char" length="30" />
    <field name="Username" type="char" length="10" />
    <yadda />
    <yadda />
    <yadda />
    </table>
    An XML format of a table is about as human-readable as the raw SQL and a diff between two versions should make a lot of sense (Oh, you've changed the "zip" field in the Address table to "postalcode"!)

    Writing a python script that would read this and output SQL is pretty easy. You could even do this it in XSL. Once you write the script, it should work great. You can tweak the script to output SQL for different DBMSs.

    Make sure both developers are using the same dtd (to make sure your XML is valid). And since XML is verbose (is it _ever_), commits have a better chance of not clashing unless you're both hacking the same part of the schema.

  • we put a mysqldump (with a couple of options) in the cvs! We considered other options, but it's by far the easiest!
  • If you don't mind using Java, check out the Jakarta project Turbine-Torque at http://jakarta.apache.org/turbine/torque/.
    Abstra ct schemas are stored in XML and Torque parses the XML and generates the database schema for whatever database vendor you pick. (Torque also generates Java code to easily work with data in the database).
  • Yeah, I know: this is off topic; but it's schemata, not schemas, isn't it?
  • Either you haven't RTMed, or you didn't explain very well. The documentation has it here [mysql.org]. Just make both masters and both slaves, and hope you don't trample on each other.

    Now that you have the key to complete disaster, I will warn you since you obviously don't know this. This is a stupid idea. Hmm... that not harsh enough. If you were working for me, I'd reconsider your employment if you came up with this crazy idea.

    You should never, ever, ever, ever, have more than one person working on the same schema while coding! Either the database will be driven by your code, which is a quick way to denormalize everthing, and wreck havok, or more than one person will drive the design, and will denormalize and wreck havoc.

    Only one person (or group) should make decisions on a schema, and it should be done, *before* any coding is done. The database structure will lend structure to the program itself.

    I was a DBA for Oracle at a small company. You wouldn't believe what those idiots (Andrew, you listening? :-)) wanted to do, and did when I went on vacation! Instead of coding properly, they added tables to make it easier. And then they wonder why the DB slows down on their queries. Or they want to know a concrete method (query or idea) of encapsulating all data of a certain type, and wonder why I can't do it, since they left so many holes in the system.

    The database should be designed to handle the system. And that it much more important than coding. Both because of speed and structural reasons. The only time the database should be changed, is when there was a mistake in the original design, or the project it is for has changed dramatically.

    So, their is a way to do it, but its on par with logging in as root all the time, so things are easier. Don't do it.
    • Well, this is utterly bogus. The thing to remember is the effective use of the relational model allows for consistent queries. The RDBMS stores facts, and queries are propositions. It's logic programming. Programming. Programming. Not Stone.Normalization may be done by Normal people. It doesn't require godhood.

      The theory of the immutable, pre-determined database schema makes the DBA an asshole. It's not necessary or correct. Typically the problem arises when the DBA is out of touch with the data. When the minimum granularity of new testing databases is all the data, you have hit this point.

      Moral: Know the data. Track a baseline database to go with the schema. Always script the data as well as the schema, so as to be able to build a database to go with the schema. Sure, use a tool to identify schema changes. You Must Use version control on both or puke.

      Don't be above the fray. Quit picking on developers.

      • I have to agree with the OP. Ruin the normalization of a large database and your screwed. I don't want to pick on the developers, but I wouldn't accept modifications to the database from them with out good reason.

        Certainly, there are times that de-normalization of the schema is necessary for good perfomance or other reasons. Just don't change the schema only to ease application development.

        • Certainly, there are times that de-normalization of the schema is necessary for good perfomance or other reasons.

          I shudder at the thought.

          Oracle, in their documentation, goes through the decision of whether the DB is for OLTP or for a Data Warehouse. Nomalization is required in the former, and sometimes hurtful in the latter.

          If you are a data warehouse, completely, than you may need to go without normalization. However, it is likely, that normalization will help anyway. So, it can be resolved with two schemas. One normalized, the other not. The normalized schema is the "authorative" one. The non-normalized schema being created periodically from the normalized data, depending on requirements for up-to-date data.
      • The RDBMS stores facts,

        Ah, you could not have done a better job of displaying your fallacy.

        The RDBMS does not store "facts". Or at least, that is not its job. It's job is to store "data". Hence "database", not "knowledgebase". Two *completely* different things.

        The RDBMS is not defined by the data it stores either. It is defined by its schema. The schema itself is enough to understand an entire project. Both between the tables and columns, and the foreign keys in between them all.

        The schema is not trivial. It is an answer to the question that the project provides.

        Project: Here are my complexities, how shall I answer it?

        RDBMS: With this schema.

        Once the schema is done, the answer is there. The data and the program merely make it all work. Given the rigidity and simplicity of an RDBMS as opposed to a program, it is obviously the best solution.

        And now, to change anything, means that the project was not fully understood, or the project has changed. Both of which need one person (or group) to redesign it, and try to get it right this time.

        and queries are propositions.

        Only at a very few instances. Mostly, queries retrieve data.

        It's logic programming. Programming. Programming. Not Stone.

        And let me guess, designing should be thrown out the window for losers to eat too? It's design, design, design! Programming is a method of coding that which was designed. Logic is used mostly for design, and sometimes in coding, when a "trick" is needed, for whatever the reason.

        Normalization may be done by Normal people.

        I can't help but laugh at this one. Most people are not gifted with the ability to normalize easily and decisively. DBAs make better DBAs, and programmers make better programmers (usually). Each has their strentghs. (DBAs being TJs, and programmers being NPs; just a guess). To say that just about anyone can do such a job is ridiculous.

        It doesn't require godhood

        True, but is does require someone gifted in that area. (I use the word "gifted" based on the book "Gifts Differing").

        The theory of the immutable, pre-determined database schema makes the DBA

        No, it makes the DBA a good DBA. Should the DBA let just any changes go through, the database is doomed. Sure, some changes may be needed, but it is only if the project was misunderstood, or if the project changed. This should be an unusual case.

        It's not necessary or correct.

        Why not? I have already mentioned why it is necessary and correct. It is necessary in order to not wreack havoc in the database. It is correct, in that it is part of design, which is what drives programming. To have programming drive design would be backwards.

        Typically the problem arises when the DBA is out of touch with the data.

        That could be. But if so, he is not doing his job. And I assume that to be unlikely. Or, the developers and the DBA have a conflict in how to resolve the project. If that is the case, a solution agreeable by both is needed, and then the DBA has final word on the design. But this needs to be done *before* any real coding happens.

        Quit picking on developers.

        I was both a developer and a DBA. Each has their roles. Stop trying to interchange them.
        • I think the hardest part is trying to have people work with data and not understand what it is. When the programmers don't know, you're pages/program won't do what you want it to until about the 30th try. If the DBA's don't know, and assuming the programmers have a clue, they'll go nuts trying to pull it out. If neither of them have a hint, it'll take forever. Either that, or your spec will be so damned clear that you may as well take it the next three inches and program it yourself.
          I've seen what ignorance between requirements and product can result in. Very compentently designed incorrect systems.

          I love people's pictures of systems that are static. My project that I work on is constantly evolving. Needs are being found all the time that no ammount of forethought would have found. And if we could, we wouldn't use the means for making a database. (Think lottery or earthquake warning system, etc.)

          Anyway, I'm in the strange situation of being both the DBA and the developer at the same time, and I try to do both well. I constantly think I could do better, but others seem pretty happy with things.

          Andy
          INTP that likes to pretend he's an INTJ.
          • Needs are being found all the time that no ammount of forethought would have found.

            I have a hard time believing that, thought I will not debate it, because I understand that it could be true.

            Maybe the designer (or the requirements makers) didn't think the project through. Certainly, a project can be so defined that only a few, non-schema-busting changes are required.

            If the program is being made to generate a business, and the business is continually updating their model to meet new clients, a different approach would be required. The database should be patched a lot, at first, and then a cut-off point is needed, and the database should go through a complete redesign. From then on, no real structures should be changed. Because, at that point, there should have enough insight to know how to catch most future changes easily.

            INTP

            Silly lazy-bum!

            that likes to pretend he's an INTJ.

            Oh, ok. that's better. :-)

            • It's designed well enough that it is mostly new fields or new tables to handle new ideas.

              Would it help if I said it is a NAVY database?

              Andy
              • Would it help if I said it is a NAVY database?

                Not really. It only helps in noting that it isn't a startup makng silly decisions.

                Past that, I don't care who makes the design. Everyone makes mistakes. Although, it could be the best design. I just consider that to be unlikely.
    • I have to agree with Chacham here; you *never* *ever* have developers changing a database schema willy nilly. It's a great way to destroy your project. The database comes *first*; if you plan your project properly from the beginning you shouldn't need to make many structural changes to your database.

      This is 101 level stuff: plan first, code later. You don't make things up as you go along.

  • I've used a system where the database has a version number (in a table).

    We then have a script with a list of 'update commands' to run (usually SQL). Each update command has a database version attribute. It runs itself if it's version is newer than the databases. Then it updates the database version.

    You could have various permutations on the implementation of this idea (we do!).

    The upshot is that you'd have a script that developers could run that would perform whatever SQL DDL or DML changes are needed to get their databases into the right order.
  • Is the actual design of the database going to change so much that anything more the mysqldump is important?

    Unlike code, databases generally need to be planned out as fully as you can in advance, for normalization (And maybe other technical buzz words I can't remember) - So while the data will change, the structure should always be the same. (And the data been different shouldnt be a problem)

    To make things easy, adding a little script that dumps the database, then updates the cvs in one easy command shouldn't take more then a few moments to russle up.
  • There is a product, called DBExaminer, that will analyze the differences between two schemas and create a script to update one schema to the other. It's not free, it's not Open Source, it's not perfect, but it does work for that task, and it's not horribly expensive (on the order of $2000). It will also analyze your model for Normalization errors...probably a bit too thoroughly, but too much information is better than not enough. You can get a demo at DBE Software's site [dbesoftware.com]. No, I'm not an employee nor do I make any money from recommending their stuff, I just have used it at a couple of different jobs and it worked darned well for what we needed it for.
  • Dunno if Oracle's really an option for you, but I do know that you can have a fallback database set up which is 0-15 minutes (exact delay depends on configuration) behind the 'live' one. This uses the archive logs to apply updates to the backup DB. It's normally used for high availability (i.e. when the live dies, you switch to the fallback which is almost up to date) but might help here.

    The main problem is that if you change the dev copy of data, you'll have to rebuild it from scratch unless you can rollback.

Those who can, do; those who can't, write. Those who can't write work for the Bell Labs Record.

Working...