Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Technology

Open Source License For Databases? 85

Myddrin asks: "Recently there has been lot of discussion of databases, and who owns them. The US either is considering or passed a law saying a Database(and info contained there-in) is owned by the creating person/company. [I honestly can't remember.] At anyrate, this got me thinking of a the (possible) need for Database GPL (DGPL). Basically the same as the LGPL, but adding that the database host (i.e. the owner of the server hosting the specific instance of the db) can put restrictions on access allowing them to offset the cost of hosting the machine (administration, i'net connection, etc)." Any data in a database is content, just like information on a web page. Maybe an Open Content License might be a better idea? Thoughts? (More)

"...Examples of acceptable restrictions would be:

  1. any program accessing this database must display the advert. provided,
  2. a cost of $.000000001 per record returned
  3. a nominal monthly subscription fee...
something like that. Very similar to the part of the (L)GPL that says you can charge a nominal fee for the materials of distrubution. The idea is that several competing servers could be set up, with multiple competing open and closed source clients running against it.

Is there a license that allows this kind of thing, or should I be working on one? "

This discussion has been archived. No new comments can be posted.

Open Source License For Databases?

Comments Filter:
  • by EnForce ( 25933 ) on Monday January 03, 2000 @10:07AM (#1409799) Homepage
    With all the crap we've seen on NSI's Whois database, I'd say this is damn good idea - why shouldn't something created by the public (yes, all of our registrations created this database!) be owned by the public?
  • by VWswing ( 74185 ) on Monday January 03, 2000 @10:08AM (#1409800) Homepage
    Or more like a restriction. Our personal information that is already floating around can not be resold but merely modified with changes? :)

    Ok.. but at least a legal way to prevent us from being in public or sellable databases. I'm so tired of getting calls from phone-spammers.


  • They can't have it both ways.

    If such a law is passed, this means that anybody who creates a database of song lyrics owns it!

    If copyright law interferes with this, then I'm going to copyright my personal information.
  • Another license?
    Is it needed? Isn't that what
    copyright is for?

    ---
    120
    chars is barely sufficient
  • I cannot imagine the FSF would sanction a license (at least I'm assuming you would want DGPL to be sanctioned by the FSF, based on the suggested name) that would require advertisement. Although, in the web-context, I suppose advertisements are the closest thing to a common currency. I still think that'd be the real sticking point, though.
    Christopher A. Bohn
  • by jd ( 1658 )
    Copyrights cover any "organised collection of data", so does cover databases. Some equivalent of Copyleft for the specific case of databases would be great! (It shouldn't need a significant change, either, as it's all straight copyright law.)

    IMHO, it would be great to have a generic "copyleft" scheme, which covered everything, but for now, something for each of the significant special cases (eg: code, documentation, art, databases, etc.) is a good start.

  • The distinction to bear in mind is between the database itself and the data entered into the database.
    Christopher A. Bohn
  • I mean, legal language aside, the desired license seems to contain the main precepts of the GPL:

    • The information in the database is free to be used, and even incorporated in a derived work, as long as that work is also covered by the same license.
    • The hosting service is not charging for the data, but rather for the service of providing a means to access that data.
    • The original owner(s) of any data, which requires that no one else's data was used to create the data, may release the data under any other license(s) they so desire.



    --

  • Slickness points for the haiku (that I won't attempt to match).

    But, regarding the content, a license is how you give people permission to use copyrighted material. That is, the copyright is the claim of ownership, and the license is the set of conditions under which you're willing to share the use of the material you own.


    Christopher A. Bohn
  • by wfrp01 ( 82831 ) on Monday January 03, 2000 @10:22AM (#1409809) Journal
    The Free Software Foundation (http://www.gnu.org/) has been working up a license to cover documentation. Not exactly the same as what's being discussed here, but maybe close, if you think that information is information is information. Perhaps with some minor changes it would do the job, or a similar variant could be derived.

    This is a work in progress (correct me if I'm wrong). At least I don't yet see it on the Free Software Foundation's license list (http://www.gnu.org/philosophy/license-list.html)

    I'm sure the authors would have more appropriate input than myself. Just my two cents.
  • In my experience (as someone who has setup databases and interfaces for commercial ventures), the major concern of for-profit database owners is not that they won't be able to make money off the database (even if it's just ad revenues) but that someone will be able to grab all their information and resell it better than they can. I'd imagine that the major concern of not-for-profit database owners/creators is that someone will fragment the database through irregular mirroring.

    The concerns of for-profit database owners is not paramount to a DGPL but the copying/mirroring of data should still be the focus. Towards this, it should be ensured that the DGPL addresses both dynamic and static databases and gives owners as much reason to use this license as the LGPL.

  • by fraxinus ( 114503 ) on Monday January 03, 2000 @10:24AM (#1409811)
    A few comments:
    I would like to see licenses concentrating on the data (content) rather than the whole database (the collection of data) - that would let you modify, it resell it etc -- much like the US census data or USGS geographic datasets.

    This question is very interesting, especially for geographic data (for GIS -- Geographic Information Systems). The situation in the US is like a dream, where all the USGS data is distributed without any tough restrictions (a BSD-ish license for data). The datasets are very expensive to create and a valuable asset.

    In comparison, the situation in most of Europe (for example UK or Sweden, where Im from) is that the mapping agencies are recovering most of the costs associated in creating digital geographic datasets. They are incredibly expensive!!! Thus the use of GIS is much more restricted (as well as development in the field) in this part of the world.

    Another interesting point, the license which NASA licenses the new Landsat 7 digital imagery. They are a lot cheaper than before (a few hundred $$$) and the license is 100% non-restricted (even here a BSD-ish license). In comparison, earlier Landsats, and the current competitors are a magnitude more expensive, and in most cases they require you to license the USE of the data, not the 'ownership' of the data. That way you had to buy one license to use a satellite image for education/classes and another license to use the same image for an analysis... Landsat is run by the US government, so it is you tax payers that are paying for this give-away (they are not obviously recovering all the costs for the operation)

    Nowadays there are people that have bought (and used) the new Landsat images and are making them available for download (for free!). Of course this is under great debate (imagine the competitors to Landsat).

    So... More talk about data and databases!

  • >If copyright law interferes with this, then I'm
    >going to copyright my personal information.

    Hmm, that's actually a fairly interesting idea. Could this be done to thwart having your phone number resold etc? Copyright your name, phone number, and address and then sue people who sell it for infringement? Hmm. It seems to me that there is already established the idea that personal information has value, it seems logical that the person whose information it is should be considered the owner. If companies have to pay ME for my phone number/mailing address, I can set the price high so it's not worth the effort for them to spam me with advertisements
  • I'm working to create a text-based musical notation copy of a music book that has fallen into the public domain. I was wondering what license I could release it under- technically, the files are information, not program logic. The music format is human-readable, but also parsable by graphical notation programs. Hmm, any suggestions?
  • I would say that whoever creates the database owns it; but what do they own? If the database contains personal information surely the individual records "belong" to whoever they concern?

    Possibly creating a database means you own the right to control access to it; and to assign control over the data in it.

    Anyone know if the Data Protection Act in the UK says anything about ownership of databases and the data within them? I know it says that you have the right to view any personal data held on file (either physical file or stored in a computer system) and something about being able correct that data if it's incorrect.

  • IANAL, but I don't think that your phone number and address are yours to copyright, although maybe there could be an exception if they are lumped together in a package (Name && Address && Phone number).
  • I just LOVE the idea, if it's going to be awailable for non-US residents also! I keep getting +40 spam mails every day, all from the US unfortunately, so any restrictions on thoose companies would be very welcome.

    I frankly don't have any use for special dentist deals in the US when I live in Denmark. :)

    -- Origin: Wow, my handle is ElItE! ;)
    [HeMaN]
  • by Artagel ( 114272 ) on Monday January 03, 2000 @10:46AM (#1409817) Homepage
    A database can be protected by copyright if there is sufficient originality in the "selection and arrangement" of the contents. As pointed out earlier, it is important to remember that the contents can be separately protected. Think of it this way: A book of quotations can be protected as a compilation. Each of the quotations within it may also be protected by copyright in the quoted work. There are many useful databases which cannot be protected by copyright, usually databases that are made up of facts, and those facts are comprehensive and have obvious arrangements. A white pages phone book includes all of the phone numbers and names, and arranges them alphabetically. So much for selection and arrangement.

    The problem is that it can be hard work to research and compile these facts even if the result has no originality. I think we believe that people should be able to obtain benefit from their work. Database protection schemes try to create a copyright-like right against the substantial extraction and reuse of facts from a database. Thus, someone who contributes to a publicly licensed database wants to be sure he can access the additions of others in the future in payment for his work (rather than the corporate-generate-cashflow model for benefit.)

    Licenses are important to accomplish that right to later access because they can work even where you don't have a 'right' to copyright. Thus, if I license a CD to you with all the phone numbers in the U.S., I can license it to you as long as you don't put it where multiple people can use it. After all, fair is fair, we have a contract, and I am just making sure I can sell my work to other people, and not have you, my customer, becoming my competitor just for having bought my product once.

    A public license on a database would really only be useful if databases DERIVED from the original had to be made available for copying. Consider a list of all the music CDs ever made. It has to be updated, since new product comes out all the time. Can someone go into the business of providing these databases by taking the old, updating it, and calling the new database proprietary? Not if you have a public license. (All of this assumes that shrinkwrap or clickwrap licenses are good. They aren't in many countries.)

    As long as the resultant database is available to be copied, in whole, then the charge for accessing the server, whether to take the whole thing at once, or one record at a time ought to just fall under a reasonable distribution charge. Heck, the record-by-record access might as well be charged at any rate the provider wants since they are providing interface as well as content. If someone wants to roll their own, let them download the database.

    I think a public database license would be a good thing because it will allow public databases to grow and be distributed in a fair way when database protection laws are passed.

  • by H3lldr0p ( 40304 ) on Monday January 03, 2000 @10:49AM (#1409818) Homepage
    I recently did a report for a tech-english class last semester. It ended up being about ownership and the Internet, most specificly who it is that owns the whole shebang. Not an easy project, and I did not end up finding what I thought I would find when I first started. The paper overall ended up being one on copyrights. So I'll say the same thing that I ended up saying in that paper.

    You cannot treat the digital world the same as the print world.

    It just cannot be done. Everybody that reads slashdot with any frequency knows the lunacy of walking down that path. So let me take that argument and apply it here.

    You cannot treat an online database the same as one you might have as hardcopy database (read:propritary, closed, or rolerdex on a desk) in an office. You cannot charge access to it in the same manner. You cannot oversee the users in the same manner. And most importantly, you cannot expect people to value the data that is stored therein the same.

    With that said how can anybody expect to make a profit by putting such a beast online. I have two thoughts.

    #1: Do as the search engines do. Find some other way to profit. I have no idea what product Yahoo [yahoo.com] makes, but for some reason people invest in it, and somebody, somewhere is making money. It has been done once, and it can be done again.

    #2: Do it ebay [ebay.com] style. Auction the info off. Highest bidder gets the ability to negotiate a use license. No cost to find out if it exists, just a cost to read it. The more people demand rare info, the higher the price goes up.

    Any body else go a suggestion?
  • by small_dick ( 127697 ) on Monday January 03, 2000 @10:58AM (#1409820)
    Plagiarism has a long history, and I saw several students get the boot from the University where I went to school for violating University guidelines.

    If you are going to do new work on a previously examined topic, you must cite your sources, have a variety of sources cited, and NOT provide a sense that the owners of the cited work have been plagarized.

    For example, I can write a book about "Snoop Doggy Dogg", provide about 100 citations (books, webpages, mag. articles, TV/Radio programs), provide my condensed "personal take" on the rapper, and publish. That's legal; it's the foundation of all new work -- deriving from the old.

    But when I cross the line (doing a rehash of an existing SDD book), and call that work my own with no citations, or with a "sense" of plagiarism, I open myself up to legal trouble.

    I think the "fair use" rules, as they apply to books, will eventually dominate this issue. People using data from webpages WILL have to cite their sources, use a variety of sources, and verbatim copiers will be penalized/threatened, etc.

    What am I missing here? This just sounds like another failure of the legislative process to provide sane solutions to a fairly simple, well-known problem. Is this just a scheme to provide incompetent lawyers with phat salaries for years to come?

    I see no fundamental difference between pages on the web and pages in the library. They both convey information to the observer in virtually the same manner. The earliest animations were just flipping paper pages anyway.

    New Year's Rocked. Love you all :-)
  • by Anonymous Coward
    Haiku Nazi here:
    One too many syllables!
    No Haiku for you!
  • There allready is a license for open content. Check out www.opencontent.org [opencontent.org].
  • Try writing haiku!
    Everyone is doing it!
    Even ESR!

    But seriously... the problem with sussing out a license for a database is that it depends on how the data is used. Open Source licenses work because the ways we use source code are pretty straightforward. Open Content licenses build off of them. But source and content have one thing in common... duplication causes no essential harm, and data integrity is not a huge issue.

    Databases, on the other hand, are often intended to centralize and synchronize information. Hence transactions, which exist to protect the integrity of the data. Moreover, databases often contain relations that require locks and triggers to maintain referential integrity. You may not WANT free copies of your database floating around, even if the information within the database should be free (speech or beer).

    So, barring lots of deep thought on the subject, i don't see a simple, general set of rules for "open" databases, because of the integrity issues, and because of the wide variety of ways in which the data may be used.
    ---
    120
    chars is barely sufficient
  • Anyway, a license more similar to GPL/LGPL for content would be nice. And a question - if I do GPL my website, is that legal? I do provide the source for it, so that shouldn't be the problem. And if that is okay, is a textfile okay? For that type of "program", there is no source. Would that render GPL unusable? Perheaps this is suitable for an Ask Slashdot?
  • What do you mean by rare? Do we really want people creating articifical scarcity and making megaprofits off it?
  • Well, I have a program - mp3db, that does database stuff. I'm going to just add a clause stating output of my program must be done under the GPL as well - ie: keeping it internal is OK, but if you release it - you do so to everyone at no cost.

    I hope RMS updates the GPL to deal with this issue more specifically soon....

  • Oops! You're right! My bad.
    What can i do about it?
    Repost corrections?
    ---
    120
    chars is barely sufficient
  • Yeah, what is it with spam for discount dentistry? I've gotten a few of those too.

  • See e.g. the Open Directory [dmoz.org] license. That's been a very successful business model (pay volunteers nothing, give away data for free) - it's growing at an astounding rate and will soon surpass Yahoo!

  • I don't think the implicaions of this law are understood completely. Current copyright law through various legal precedents grants copyright protection to the format of a collection of data. the classic example is a phone book. It also only applies to the exact organization if that organization is not obvious.

    The classic example is a phonebook. A phonebook is a collection of data i.e. names, phone numbers, and addresses. Organized in alphabetical order. As it turns out under current copyright law this has minimal protection. Alphabetical ordering is obvious, and the rest of the directory is information which by law is publlic domain and not protected by copyright.

    A law protecting databases and their content could easily extend to a copyright on information. Basically, a database should be covered just like a phone book. Any content in the database would be owned by the creator of that content, but any information would have to continue to be public domain.

    Basically, this means that the databases of internet search engines can be extracted and reorganized into a new database, simply because URL and page titles are information and therefore are not and should not be protected.

    Dastardly

    P.S. Arguably a page title could be considered the property of the creator of the original, but the URL is really public domain information and not protected by copyright.
  • try "aint" instead of "isn't"
    --
  • What if the database that your information is in was copyrighted before you copyrighted your information yourself? Would your copyright be invalid?

    The easy way around this, of course, is to change some bit of the information. Go from "Apt. 202" to "Suite 202". But this also works both ways -- they could change some irrelevant thing like your middle initial, and presto-changeo: a new entry, a new copyright.

    Since you probably wouldn't be able to copyright your address or phone number (e-mail address? Maybe...), it should be relatively easy for a marketer to take a valid entry and make all of the common permutations (Street, Lane, Drive, etc; All middle initials; Ann, Anne, Annie), then copyright the whole schmeer. They'd have to be cross-referenced, but they'd probably be able to brute-force some combination that you hadn't thought of.

  • I believe that an Open Content license would be excellent for databases, and I agree that advertising (via banners or text) is an incredibly bad revenue generator, and should not pass for a business plan at all.

    Yahoo makes money by selling targeted advertising and via selected business partnerships. Since they have huge brand name recognition and large traffic, they can afford to do so. EBay makes money as well, because they manage to turn a profit based on the volume and addictive nature of auctions.

    Many academic libraries pay (huge) subscription fees to permit per-domain access to such things as the Consumer Reports or L/N databases. Accesses to the databases are limited to a matching sub or top level domain reverse lookup. This seems to work well, and off-site university affiliates can get in remotely via proxy.

    My suggestion would be to have a GPL or Open Content license on databases or pay via subscription that would be domain based. So, if I'm Consumer Reports, I will have a web or app client connect to the ns.consumerreports.com nameserver and have it assign my current ip a name in our .subscriberpaid.consumerreports.com subdomain, which would validate me for the site. (I have no affiliation with them, just using as an example.)

  • That runs into all sorts of arguments as to what exactly defines "public knowledge" or "public origin".

    Having said that, I agree with what you're saying. If the knowledge comes entirely from open, public sources, then there does seem to be something unethical about closing the compilation of that knowledge off and keeping it for commercial gain. It's about as sensible as AOL trademarking "You've Got Mail".

    P.S. ObOffTopic footnote: If you're into electronics, check out the following websites for a scary note: Ramset Electronics [ramseyelectronics.com], 2600 [2600.com] and CyberSKIP [208.46.148.11]. There's something definitely not OK going on.

  • Rare in that potentially few people could know it. However, I personally believe that this would only happen when dealing with raw info/data comming from a research body. If the Internet teaches anybody anything it teaches that once one person knows something, then many people know it. This is the basis of that suggestion. If somebody were to claim an exclusive knowledge of something, then it would be their burden to prove that only they know it. In this sort of atmosphere, having an artifical scarcity of information / knowledge / data would be hard to create.
  • The key is to head off those who simply want to fence in existing public data commons and put up a toll gate. Those folks are parasites, creating nothing of value. They should not be able to hide this fact with exclusive service contracts with us (the government) to deliver our data to ourselves. All public-data processing contracts should be built on non-exclusive access to the data source.

    A legitimate function is to provide better access to existing data, e.g., by indexing it and providing various views (both data-wise and GUI-wise). But again, it is easy to compete in this, so some will try to fence in the raw data for their exclusive use, and lobby/plead/wheedle for contracts that provide exclusivity. These ploys should not just be resisted whack-a-mole style, they should to be eliminated as a species.

    Where a database is created by public contributions (e.g., slashdot, or amazon book customer comments, or newsgroups, etc.) the default assumption should be that the creator of the comment owns it and is offering it into the public domain for non-exclusive presentation. Other arrangements can of course be made by contract agreements. There is already a lot of precedent in look-and-feel aspects of presentation, but any copyrights or (ugh) patents there should not be able to restrict free flow of the original contributor's public offering.

    Where the database is created by us (the government), the same applies, but data-restricting exclusivity clauses in contracts should have to be explicit, and justified individually with respect to high principles (e.g. Constitution), and not just slipped in as standard contractual boilerplate.

    Privacy laws must take precedence, but must not become a vehicle for attempts at exclusivity having nothing to do with privacy.

  • by Zaffle ( 13798 ) on Monday January 03, 2000 @12:13PM (#1409839) Homepage Journal

    This has a lot to do with who does own a database... If I go out, messure the rainfall over a period of a year at 10 different places, and then put that into a database, its mine. I don't think anyone but mother nature can contest that (unless I put it in an Access database, then MS might contend ;)).

    But if I go and put all the information I know about everyone I know into a database, who does the database belong to? Can I go and sell the information? The 1991 Privacy Act in New Zealand says that if I am a company, and I collect information about ppl, one of the things I must do is along ppl access to view/modify there record. (Within reason, ppl can't demand to modify their bank balance ;)). I also must state what I plan to do with the information, including wether I plan to sell it. Ianal, but I don't think it prohibits me from selling it to anyone I want.

    Theres a good reason for this, our electoral rolls (list of ppl who are enrolled to vote, names, addresses, etc) are availible for purchase, (incidentaly, in order to have my record unavailible, I have to have a "good" reason, eg I'm being stalked, and I have a restraining order, etc. I can't opt out of it just because I want to).

    This means that my database of your personal habits I noticed is mine. And I can do with it what I want. (Note; there is an option for various personal defimation(sp) laws here if I say false things).

    Now that thats settled, what DO I want to do with my database of your habits? Well, I believe in free speach, my programs are GPL, so I want to make it free.

    I will license my database under a "free" license. This license is NOT designed to allow ppl to make money off of my database, so the same rights must be transmitted to the user of the database. So, the license must allow a user to "copy" the database one record at a time if they like.

    Now, the big thing, cost. Simple, same as the GPL, a distribution fee. ie you can charge a reasonable fee for the distribution of the database in whole to the user.

    Ahh, but what about accessing records, eg a web database, or phone, whatever. Thats fine, you can charge me whatever, that is outside the scope of the license, but what is in scope, is you MUST offer the entire database for a reasonable cost.

    "What!" you cry, "This is no good for me". Fine, then don't use the license, if you want to make money out of something, why are you trying to use a "Free" license?

    The point of the matter is, a "free" database license should not be orientated at making money. I don't earn a cent from the GPL programs I write. If i wanted to, I could, I'd just use a different license. But I don't, and I want my database of your personal habits to be free aswell.

    The minute you try and work out how a company can still make money with this license, you defeat the purpose of it. As I said, you can offer access to the database for whatever price you want, but you must offer the entire database for a resonable price too. RedHat makes their money by basically selling pretty boxes and support.

    Stop trying to work out how you can make money out of database, and start working out how you can make it available for all.

  • I cannot imagine the FSF would sanction a license that would require advertisement.

    The submitter didn't mean that it would "require" an advertisement, he implied it would "allow" an advertisement.

    The point being, if you have a public database, you can't just allow people to use it, the must have the ability to profit from it. Otherwise there's no incentive to using it. The GPL allows you to profit from source code.

    -Brent
  • Actually copyrights do not provide protection for most databases. A database is generally a collection of facts and facts are not copyrightable. Original text in a database may be protectable. For example a database of film reviews that you wrote would garner some protection based on the originality that you created.

    Copyright law provides limited protection to organization schemes of databases but only when those schemes contain some element of originality. For example, organizing a list of James Bond films alphabetically gets absolutely no protection, but organizing a list of Bond films from best to worst according to your opinion would merit some protection because of the creativity you used in ordering the list.

    The Supreme Court set out most of this in Fiest Publications v. Rural Telephone Service, 499 U.S. 340 (1991) (The case addresses telephone books and alphabetical organization directly)


    By the way, this is not legal advice.
  • Most likely would be that the individual contributors retain their individual copyright. At most you would have a compilation copyright.
    This is the way we handled it on Genie (a moment of silence for the recently departed (1999-Dec-30)). The posters kept copyright of their posts, while Genie had a compilation copyright.

    It would be a more difficult question to decide how to handle retrieving individual posts and using them. Ideally you'd have to track whose data it was. Or you could make Open Content a condition of submission. Hmmmm.

    --

  • This seems straightforward to me, how you make money, though I realize it gets fuzzier in practice.

    I believe a copmany should not make money simply by excercising complete control over a set of information (ie: a database). The service they provide me is one of collecting and providing me with said information.

    20 years ago, if you had a 1GB database that I could pay to access online, I wouldn't have had a problem with it. There is no way I could store that kind of information myself anyway.. so you were, in effect, providing a data-warehousing service. The databse is just one way to look at it.

    Nowadays, if you have a small database (something I can't reasonably fit on my computer), why should I be paying tons of money for accessing individual records, when the whole thing is pretty small anyway?

    I guess what I'm saying is, even though they like to pretend it's the information that it's all about, the real service that has been provided in the past is one of data warehousing, and data sorting; doing what others did not have the resources to do.
  • Having said that, I agree with what you're saying. If the knowledge comes entirely from open, public sources, then there does seem to be something unethical about closing the compilation of that knowledge off and keeping it for commercial gain.

    I have no problem with a company databasifying public data and charging for their compilation. Don't like it? Buy a different compilation from a competitor, or get the raw data and databaseify it yourself.

    On the other hand, I have a BIG problem with a company and a government agency cutting a sweetheart deal such that only that ONE company gets to databaseify and sell that agency's public records. (This has happened with both the US Patent Office and the Library of Congress card catalog, though I'm not sure if either exclusive deal is still in effect.)

  • You can't copyright your personal information per se. You can only copyright something that you've created - such as a document with your personal information; or song containing your personal information; or a video with your personal information; etc.

    So, a lot of people can use your personal information in their databases if the source of your personal information wasn't your copyrighted creation but something which they had the right to copy. To __sue__ for copyright infringement you would have to (1) Have the document/recording/etc. registered with the US Copyright Office and then (2) prove in court that a company used your creation to produce a derivative work without your permission...very difficult considering how many sources probably have your personal information.

    Likewise, anyone who "copyrights" data in a database can't claim to be the sole owner of the data. They shouldn't be able to win a lawsuit if you are able to demonstrate that you were able to get the same data from other sources.

    Xandis
  • by Anonymous Coward
    the http://www.useit.com/ site has the articles
    of j. nielsen the user interface haranger.

    He talks about micro charges for content access; fractions of a cent per page view, similar to this database issue.

    There are other considerations for some types of databases, mostly privacy related.

  • Look at the IMDB. [imdb.com] as an example.

    It's not "open" in the facet that it can't be repackaged or repurposed, but is "open" as far as using it to obtain a wide variety of well-organized and searchable information. Updates, servers, and bandwidth are paid for with mass exposure (advertising).

    Databases are interesting things. I mainly work with radio/tv station db's. It has been determined that the average cost for obtaining a name/address/phone is roughly $7. Appending interesting information costs more money, as well as yearly NCOA (National Change of Address) updates, databases can be expensive, or I should say, used to be. The Internet has changed things siginificantly (as if you didn't know). Acquisition has dropped (for us) to about $.10 a name.

    Large Databases used to (15-20 yrs. ago)require the work of millions of dollars of heavy iron, now a moderately equipped small company can do serious modeling/profiling and apply it (BTW, this is another reason CS majors are pulling heavy $) effectively.

    I don't really see how the OS model fits this. Unless you're talking about DB tools. OS developement isn't like DB developement, collecting/organizing data is different than coding compilers and desktop environments.

    centscents
  • When we talk about the GPL, LGPL or any other Licensing scheme we have to think about what the license applies to. It can and should only apply to the software not the data. In the case of any Database Management software it applies only to the tool, the DBMS itself, and in no way applies to the data contained within. If we were to apply it to anything created with the tool or any data we store in it, things would be much worse.

    If we apply the same logic to the Gimp or for that matter any software program any graphics created with the Gimp would be subject to the GPL. Or much worse, any Word Perfect Document created with WP would be subject to Corel's particular license.

    In no way do I agree that data in a database can be owned by the person who compiled it. If MY personal data is included in any database, I believe I'm entitled to a piece of the pie. If you didn't create that data how can you possibly own it. THINGS ARE GETTING OUT OF HAND!

    There may or may not be an answer but a licensing scheme such as this will make things worse.

  • The DGPL (database GPL) suggestion is orthoganal to, or counter to, the GPL -- I can't decide which.

    First off, the GPL and LGPL do not prevent you from charging a distribution fee, limiting access, or associating advertising with downloads from your site. You can put advertisements on the same web page from which you distribute or mirror GPL'd code. You can limit access, or even charge for access.

    However, the GPL also explicitly allows (and requires you to allow) people who download the code to set up their own sites to mirror the code, with or without access restrictions, payment, or ads. If I had to pay a penny per page to use your database but I could get it for free from Bob's House of GPL'd Databases, where do you think I'd go?

    In short, the DGPL does not suggest the same solution, or even try to solve quite the same problem, as the GPL. To charge per-search fees or tie ads to searches would require that you make content (i.e., search results and the database itself) proprietary, which runs very much counter to the GPL. Many schemes for making content proprietary exist, including not putting any explicit copyright on it at all; the GPL is not such a scheme. Calling suggestions like enforced per-search fees or advertising tie-ins the "DGPL" is misleading.

    The closest analogue to the GPL in the world of "content" is probably the Open Content license. Let's stick with that.
  • A law declaring a database to be owned by the creating person/company would be a huge coup for large corporations that collect all our personal data. This law is probably the result of heavy lobbying by them. With this law in place, individuals will not be able to claim ownership over data that is about them.

    People should own data that is about them, no matter who collects it. Failing that, they should have unrestricted access and use of data that is about them. It's time to value human rights over corporate rights. This will not cause the collapse of civilization as we know it; it will mean we can all breathe a little easier.


  • More poets in here?
    If it's haikus that you want
    I have got plenty.

    (Here are some...)



    Morning smiles upon
    Post-2K community
    The gods let us live.



    Redmond upon us
    The bloatware makes me shiver
    I fear Win2K.



    Linux is not bad
    Free if time has no value
    Should be preinstalled.



    Pheer the cracker kid
    Chats on AOL all night
    He is true 31337.



    See the Redmond Beast
    Its vapourware is worthless
    One more promise.



    C++ sucks ass
    Although I need my paycheck
    I wish Bjarne was dead.
  • The IMDB has also protected future access of anyone who contributes by allowing a download of the complete raw data and programs [imdb.com] to extract from it. They don't make it particularly obvious to find it, but they don't hide it either. They restrict the ways that you can redistribute it, but they provide it complete for unlimited personal use. That has one of the desireable properties of open source: you are not dependent on their continued existance and goodwill for access to the data.
  • First I am not a laywer...

    I see several forms of data ownership.
    1) Data that you have created yourself (eg. a midi file of original work)
    2) Personal Information - Information that is inherently yours (SSN,
    Name, bank records, medical records, court sealed records)
    3) Data that is collected that cost time / $$
    4) Public information that is general knowledge or placed in the public
    domain


    Newspapers currently charge for archive article searches and rights to
    use/republish original work (rule 1)


    Private information should be kept Private and Confidential.
    This information should only be divulged to a third party with the written
    permission of the person or by a court order. (rule 2)


    Dejanews does not own the articles it gathers from the newsgroups.
    They defer the the $cost$ of accessing it by advertizing. (rule 3)


    If a publisher produced a book of public domain programs. You could not
    copy the book; however, you could re-type the program(s) into your
    computer and use it any way you wanted. They are charging you for the book
    and the effort it took to produce, and not the content(programs). (rule 4)


    And that is my 2 cents worth for now.

  • IANAL, but I'd have to agree.

    Your SSN/SIN numbers are owned by the government, your telephone number is owned by the telephone companies, your drivers license by the DMV, your address by the city, your name is probably pretty public domain and your birthdate is definitely not yours alone. Even your job history is likely owned by your employers and criminal records by the police. Shopping patterns by the discount card holders and/or credit card companies, and your email address to your ISP.

    The only thing that is yours, and only because there is government legislation stating so, is your healthcare information.
  • To me, a database is just as much a protectable entity as a book in that it is a particular collection of information in a particular fixed form. The data itself is no more copyrightable than are the words in a book. But the collection is unique. Can it be duplicated? Sure. But an out and out copy is a copy of someone else's work and it is up to them what limits they wish to impose on that. The flip side of open licensing is respect for people's licensing decisions, after all.
  • Vagary [slashdot.org] wrote
    the major concern of for-profit database owners is not that they won't be able to make money off the database (even if it's just ad revenues) but that someone will be able to grab all their information and resell it better than they can

    But isn't this the point of entrepreneurship in providing a better service/product than existing providers? Nobody has a monopoly on good ideas ... supposing I came up with a neat way of categorising say /. karma points so that people could identify consistently good posters on specific topics. In essense I am improving on the product that Andover provides through my own efforts and ingenuity, creating additional value add but without infringing on /.'s content or resources. Now to what extent does the original database developer can claim ownership of higher level products. Is it like a laser diode producer claiming ownership of all CD-rom readers? Or like eBay hassling that multi-search aution site? Is it OK so long as it points back to the original site (encouraging people to post more and create more karma)? The point being in the digital, there are no firm boundaries so it is difficult to claim any exclusive property rights, especially when ideas breed and multiply upon themselves. Should the right to porting and additional development be automatically denied? Or is it enlightened self-interest to encourage as much use of the underlying platform as possible (e.g. Microsoft being a little lax in chasing piracy in China as they want to encourage its adoption).

    What are good compromises that balance the interests of all the parties involved?

    LL
  • I want everyone to have open access to Slashdot's database of comments, so we can write a better (eg not web-based) UI. Of course if we had this access, Andover would lose some of its hold over its revenue stream. Still, it could license the data with a provision that viewing software must include Andover-specified ads and/or pay a fee. (That's hard to enforce, but nothing stops you from using ad-striping software now.)

    Of course this is a special case of my general desire for API/open-protocol access to all databases. Why should I have to use Amazon's web-based interface to buy a book from them?

    Right now the problem is both human and technical. XML will address some of the technical issues. (I'm working on some others.) The political and commercial issues will be tricky.

  • I think that I may have stated my question poorly. Unfortunately I was on vaction when this was posted, so I missed much of the discussion.

    My idean is to post the database schema under something like the LGPL, allowing multiple sources to host information on say the value of 1980's comics. The structure of the DB would progress like any other open source project, but the content would be available from several source, each with different content. All I'm talking about doing is setting up a standard db for a given function (dishing out the value of you X-MEN 247) that many people could write open/closed source clients for searching.... but the content would not be syncronized unless the "licensce" (the restrictions mentioned in my question) allowed for it.

    Is that any clearer?
  • The regular GPL already handles this, as a database compilation is already copyrighted. The problem is that releasing a database under GPL allows someone to still download the entire database, modify it, and then use it for private use (INAL, but I think this is true). That makes it quite prohibitive to release many types of databases under this licence (essentially making it the same as the BSD licence). In any case, a DGPL would probably need to have more restrictions, not less as you suggest. I'd love to see databases released under GPL, it would be a perfect candidate as they are so quickly evolving and there's always more data to be added. Imagine a GPLed movie database with every movie and actor in it, that anyone could use as a backend to her website, but if someone adds entries to the database and then "distributes" that via incorporating it into her search scripts, the person would have to release the entire modified database. The only restriction I would want to add to the GPL is that one must release the database in raw format, if she modifies it and then makes it available through a web site. INAL, so maybe this is already true, it probably would still be nice to spell that out.
  • Thank you! This is exactly what I am talking about! This is a much better explination of what I am talking about. Thank you!!!!!!!
  • Free beer. It is not open at all in the GPL sense of the term: "Specifically the files may NOT be used to construct any kind of on-line database (except for individual personal use). Clearance for ALL such on-line data resources must be requested from Internet Movie Database Ltd" - http://us.imdb.com/Copyright
  • Yes, that's much clearer, thanks! I can see open-sourcing the *schema* for a database rather than the data itself.

    Too bad none of this will be moderated up.

    ---
    120
    chars is barely sufficient
  • I'm in a very awkward position on this issue, and would like to figure out just where I stand!

    I run a web site called hockeydb.com. On the site I archive historical hockey statistics. Not just NHL, minor leagues too -- you can look up any pro hockey player ever on the site. I hate to sound like I'm bragging, but there is nothing else like it that I'm aware of for any sport.

    I compiled nearly all the data myself by searching out and purchasing many volumes of books with hockey statistics in them. It's not as easy as it sounds -- there's no central place to find this stuff.

    I spent 5 years building the collection, and thousands and thousands of hours typing in the data, fixing mistakes, standardizing names, etc. I've developed a custom computer program to maintain the data which was a non-trivial cost.

    One part of me would feel very bad if the data suddenly became open source. I spent so much time on it, why would it be fair if ESPN or some company just grabbed it and decided to sell it? It would surely make my data worthless if everyone had it.

    [on a side note, I know there is no legal protection on it now, although such databases aren't quite 'open source' yet].

    On the other hand, I'm dependent on several entities for current compiled statistical data. One of those entities is a company called Howe Sportsdata, another is the NHL via the Elias Sports Bureau.

    Howe is contracted by the minor hockey leagues to compile their statistics. The teams fax their game sheets to Howe, and Howe adds up the numbers and publishes official stats. The leagues pay Howe a good sum of money to do this, but it's cheaper than if they did it themselves.

    If databases become copyrightable, then Howe -- as the compiler of the data -- could claim copyright on all the numbers. Or perhaps the leagues could claim copyright.

    [I've heard that the NBA is trying to claim copyright on their statistics so that they can license them instead of publish them for free.]

    It would be literally impossible to compile the information by hand because the data is only in Howe's (or the teams') possession. You couldn't even duplicate their effort because only the official scorer (there's only one per game) knows the true "facts".

    So what is my position? I don't know -- it tears me up every time I think about it! I heavily lean towards no copyrights because it would sew up such data beyond belief and no one would benefit.

    What is the true philosophy behind open source? Create something so that companies with many more resources than you can exploit your labor? Does the fact that something new was created somehow right that wrong? Is there a middle ground here?

    Ralph Slate
    http://www.hockeydb.com
  • And that's a problem we could also address with an open-database version of slashdot. :-)

    Yes, it (building good group-moderated systems) is a Hard Problem. But right now we're not allowed to even try to solve it without starting over from scratch again.

  • A such issue would help the people over at the Humane Genome Project, as they are bothered by companies who work out new patents based on their material. This is not fair. You know the stuff.
  • The problem is that under current copyright law facts are not owned by anyone even if you did the work to acquire those facts they are nto owned by you. Using the original poster's example of rainfall data.

    If he goe out and collects the data and publishes it to a database, copyright law would say he owns the data base, and protect that ownership. The thing is that he doesn't own the facts in the database. Some one else could use the facts in the database (i.e. each piece of rainfall data) and put it into there own compilation. And, the original owner will have no legal protection. While ethically the original source of the facts must be sited, there original author does not have to be compensated for the use of the data in another work.

    This is a good thing. Facts should always be public domain regardless of who did the work to get those facts. In the US this has been established by legal precedent and copyright law. The proposed change to the law would increase the costs of research and development because suddenly every researcher would have to pay for basic facts, and could not legally compile there own database without either paying the original source of the facts, or collecting those same facts all over again.
  • Note: this only applies to copyright law. Patent law on the other hand is a completely different issue although, i believe it should follow the same rules to some extent. I believe patents on genetic codes that occur in nature should be unenforceable since they are simply fact, and regardless of the effort required to decode the fact that my Y chromosome has the sequence hcctgaaggth should not be patentable.

THEGODDESSOFTHENETHASTWISTINGFINGERSANDHERVOICEISLIKEAJAVELININTHENIGHTDUDE

Working...