Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Data Management In Collaborative Software Applications? 3

nfreier asks: "I am a developer working on a research project at the University of Washington. The project is called UrbanSim. We are developing a metropolitan land use and transportation modeling system along with a package for data manipulation and visualization. The data required as input to this modeling system is of significant size and complexity, originating from multiple sources and in multiple formats. We are looking for a method to track data changes at a central repository, much akin to the notion of tracking changes in source code à la CVS. We thought of using CVS initially but quickly realized that it does not handle large files and binaries well, nor is it well suited to tabular data. Other options include hand-rolling our own data server or using a database. Obviously, the drawbacks to hand-rolling our own are time and effort. The drawback to a database system is that it generally restricts us to using basic relational data formats only. Another constraint is that our entire project is under the GNU GPL and anything that we rely upon should also be freely available."

"Any solution we use must be able to:

  • track multiple versions of data
  • be accessed via network from multiple disparate geographical sites
  • support local modifications without affecting the repository
  • handle large binary and ASCII data

The data management problem must be a common issue in other projects and I am hoping that someone can advise me of a possible solution."

This discussion has been archived. No new comments can be posted.

Data Management in Collaborative Software Application?

Comments Filter:
  • What about RCS [fsf.org]? SCCS? Aegis?

    If that doesn't work for you, follow some links. Go to freshmeat. Do a search.

    Heck, there are whole books written about this stuff, even by O'Reilly!
    ---
    pb Reply or e-mail; don't vaguely moderate [ncsu.edu].
  • The drawback to a database system is that it generally restricts us to using basic relational data formats only.

    Not really sure what you mean by this. Are you familiar with datatypes such as the BLOB (Binary Large OBject)? Any type of data can be stored in a relational database.

  • I don't think there's a rela big reason to discount CVS right off the start, especially if you had original interest in it... I wouldn't say that it is bad with large files, but it definitely is with binary-files - it's only intended for ASCII files.

    So the solution is, quite simply, to write "filters" to translate between the binary representation and some deterministic ASCII representation. uuencode, naturally, is not the perfect solution - you need to find a structure which quarantees there's minimal change to the rest of the lines when one part changes.

    The trivial solution is to record every data-item (Byte?) on separate ASCII document line, but I'm sure you can do better than that, since the extra newlines add unneeded complexity. The tirck to the optimal solution would be to group the data on lines so, that the average change affects minimu number of lines.

    There are surely other solutions, many which have advantages over the CVS version, but I believe quite a few of them would rely on commercial, properietary software or require rolling your own completely.

One small step for man, one giant stumble for mankind.

Working...