Follow Slashdot stories on Twitter


Forgot your password?
Transportation IT

Ask Slashdot: How To Both Mirror and Protect Crowdsourced Data? 76

New submitter cellurl writes "I run wikispeedia, a database of speed limit signs. People approach us to mirror our data, but I am quite certain it will become a one-way street. So my question is: How can I give consumers peace of mind in using our data and not give up the ship? We want to be the clearing house for this information, at the same time following our charter of providing safety. Some thoughts that come to mind are creating a 'Service Level Agreement' which they will no doubt reject, or MySQL-clustering, or rsync. Any thoughts, (technically, logistically, legally) appreciated."
This discussion has been archived. No new comments can be posted.

Ask Slashdot: How To Both Mirror and Protect Crowdsourced Data?

Comments Filter:
  • by BitZtream ( 692029 ) on Monday October 22, 2012 @01:50AM (#41726115)

    You'll only be THE clearing house if you are the best source. Second, it's public data, stop trying to own it, you can't, it's not yours to own in the first place.

    • This is what I was thinking. It's not an ownership e-peen contest. It's letting people have their one-way streets, realizing it's not the end of the world. Creative Commons Share-Alike it if you will, but I'm not sure there's a better way to do it.
      • Well, I mean, the alternative is that you insist that it IS an e-peen contest. If that's what you're going for, then by all means, build an API, license it out, but most importantly, PATENT THE MECHANISM YOU HAVE FOR COLLECTING DATA. Seriously. The more extraneous words you can add in, the better. If you need help on that, just let me know. I have a friend; this guy is amazing. He has this thing called a thesaurus. Neither me nor my MBA friends are entirely sure what exactly it is or does, but we know that when he uses it, it makes RoI improve 23% and IPOs, on average (cause we're professionals) improve by 62%, on average, by volume.

        Seriously though, to anyone reading this, I'm trashed, full of shit, banned from posting on the forums I normally frequent, and too uncoordinated to start an emulator. Do not mod this up. Do not encourage the OP.
      • Re: (Score:2, Interesting)

        by Anonymous Coward

        Other sites slurp OpenStreetMap data all the time. No biggie, that's what it's for - if the traffic gets too much they *ask* you to take a mirror to reduce bandwidth costs. OSM has a "share with attribution" kinda licence.

        If you're really wiki-anything, you'll recognise that this is public information that you curate. Let 'em have it.

    • by bugnuts ( 94678 )

      It's a collection of public data, kind of like you know, a dictionary.
      Each individual picture is copyrighted. The collection has an editorial copyright much like encyclopedias.

      And he does actually own the collection. Do you really think databases of public data can't be "owned" (in the non haxorz way)? Better tell Google to stop wasting all that money on street view, which is merely taking pictures of public streets.

    • by Xacid ( 560407 ) on Monday October 22, 2012 @06:52AM (#41726943) Journal

      Well that's nice until a facebook comes along to crush the myspace. "Public data" isn't something to be owned. But a specific distribution method or implementation of it can be. Yellowpages anyone?

      If they're trying to make a living off this there is the real world factor of keeping this info someone secured and then following up with a business model of some sort. Just because it says non-profit doesn't mean everyone works for free.

      • Facebook replaced because they were a better Myspace.

        Everyone gets replaced eventually, you only lead while you actually have the best product or cheat.

        • by Xacid ( 560407 )

          Of course, but why give someone else a headstart and piss away all of your efforts?

  • by cold fjord ( 826450 ) on Monday October 22, 2012 @01:54AM (#41726123)

    There are plenty of publicly accessible sites that mirror data from trivial to critical. I would contact a few of them and see what agreements they have in place, if any.

    I would think you would want to make sure they note their data is a mirror, and that updates should be sent to your site. That might be handled by doc files for each file, or some type of about file in each directory. You probably want something like that if for no other reason so as to note metadata.

    I've seen quite a few sites that prefer that you go to a mirror to download actual data.

  • Consider teaming up with a seasoned negotiator with good business sense, or hiring an attorney -- or both. If there is any value in your dataset, those who got in touch with you will not reject fees, SLA's, reciprocal updates, etc. It all depends on how much data you have, and how accurate it is.

    On a separate note: your site is disfunctional on my tablet. I'm left wondering what it's about or how it's supposed to work.

  • by muphin ( 842524 ) on Monday October 22, 2012 @02:02AM (#41726151) Homepage
    create an API and provide an interface where your client base can interface with the data.
    there are a lot of places out there that does this, as its considered Intellectual Property.
  • Might not translate exactly, but look into how the openbsd project mirrors their stuff. There is the main site, tons of mirrors. Everything is hashed. Grab a mirror, if you don't trust it get the hashes from the main site and check the files. Not sure if it would scale to what you're doing. And what do you mean by 'giving up the ship' exactly?
  • License the mirroring only in the event that:

    1. It's visibly acknowledged that you are the source site
    2. updates are either directly sent to you, or are sent to you by the other site within a time limit
    3. All content on your site, including that sent to you by another (mirror) site, be watermarked as belong to your site. For pictures, this would be a visible watermark on the picture.

  • Be the best (Score:4, Interesting)

    by giorgist ( 1208992 ) on Monday October 22, 2012 @02:31AM (#41726231)
    Be the best Make all information free Choose a good licence Expect to be taken over one day from something better, when that comes along ... help them Make it easy for anybody to use your information It is counterintuitive but the moment you put up protective barriers, you fall over. The moment you depend on an artificial barrier to protect your lead is the moment you will degrade the quality of your product. Happens every time on products and services that grow on openness and suddenly feel the reason they are good is more so because of their qualities than the openness. If you develop a product/service based on a closed environment, that is a different story. It makes business sense to improve your model based on a closed environment until a disruptive product/service comes along.
  • by king neckbeard ( 1801738 ) on Monday October 22, 2012 @02:47AM (#41726281)
    This is a compilation of public data, with the legwork being done by others. You've got no real legal option in protecting the data, at least in regards to the US. You could perhaps try some technical means of controlling the data, but that would greatly reduce the utility. I would also consider in unethical to try and 'own' the results of work done by other unpaid volunteers. If you wish to be the center of this data collection, than make it as useful as possible.
    • by mattr ( 78516 )

      Tell that to Lexis-Nexis.

      • You don't pay Lexis-Nexis for the data, you pay them to FIND the data you're looking for. They can claim they own it all day long, doesn't make it actually true.

    • That depends on the AUP of the site in question. Mine states (or used to, my legal module may be missing) that comments remain the property of the poster but that I'm granted a irrevocable right of reproduction for any and all purposes. If anyone were uploading original images or other potentially useful data I might want to protect that right.

  • Protect from what? (Score:4, Interesting)

    by ysth ( 1368415 ) on Monday October 22, 2012 @02:49AM (#41726287)

    You want to "Protect...Data", "not give up the ship", "follow...our charter of providing safety". But what is it that you don't what mirrors to do with the data? Less verbiage, more clarity, please.

    • Re: (Score:3, Insightful)

      by Anonymous Coward

      It's fairly obvious to anyone who took 3 seconds to figure out what they are asking.
      They don't want to give their data up only to loose all their user-base to a "mirror". There are several ways around this, probably the easiest is not to share the data.
      However, their data does appear that it could potentially be of great use, especially to anyone who wants to calculate an accurate arival time when talking a trip. I would recommend keeping the actual data on your server, but providing an external API that al

      • by ysth ( 1368415 )

        Err, no, that isn't obvious. Or at least it is in direct conflict with the request that the data be "mirrored", which implies to me a copy which also distibutes to the public.

        I'd like to hear more from cellurl to resolve the conflict I see in the request.

      • by Anonymous Coward

        They don't want to give their data up only to loose all their user-base to a "mirror".

        It's not their data at all, it's data that was entered by their users.

  • If it's safety you want, I don't understand why you are trying to get other sites to freely back up your data.
    Get a real backup service and tell people how it's backed up, poof! safety.
    Or if you want to make a community resource you can do like sourceforge, ibiblio, etc, free mirrors that point back to your site.

  • by subreality ( 157447 ) on Monday October 22, 2012 @03:11AM (#41726341)

    You don't want to mandate people give you data. That will just get you bad data. Instead, make it as easy as possible for them to do - APIs, easy web forms, any method you can think of that will make the barrier to entry as low as possible. Encourage them to use it, but relax and set your data Free and don't try to force it. It's like Wikis... Somehow it works out OK.

  • by Anonymous Coward

    Fully Homomorphic Encryption. FHE. See

  • by FaxeTheCat ( 1394763 ) on Monday October 22, 2012 @03:51AM (#41726463)
    From the home page "the sign you capture is copyrighted with your name since you found it".

    How on earth can you copyright a speed sign, and even if you could, how can that copyright be relevant to anything?

    The location and speed limit of a speed sign is a fact. How can that be copyrighted? How can it limit the rights of others who observer the sign to publish its location and speed limit?

    If anybody were entitled to copyright a speed sign, it would be the authorities that put it there and who actually own it. How can the location of other peoples property be copyrightable? Looks like somebody took the concept one step too far...
    • by bugnuts ( 94678 )

      Obviously it's the capture that's copyrighted. Certainly it's ambiguously stated, but did you really not understand it?

      And facts can be copyrighted. The sun rising over a meadow is a fact, but a picture or drawing or recorded description of it is copyrighted.

      • Obviously it's the capture that's copyrighted. Certainly it's ambiguously stated, but did you really not understand it?

        As it is a legally binding statement, it needs to be unambiguously stated. Also, if each and every fact is copyrighted by the "discoverer" that would place some severe limitations on the information, as each and every copyright holder would need to accept changes to the use of the data. This means that as an example replicating the data may require a license from all copyright holders. From the posting, it appears that this is not the case, so the copyright has no value whatsoever.

        And facts can be copyrighted. The sun rising over a meadow is a fact, but a picture or drawing or recorded description of it is copyrighted.

        There is a difference betw

        • The statement itself isn't what's legally binding. Unless explicilty stated otherwise via assignment to the public domain, copyright protection for produced works (such as photographs) is automatic in the United States. As for the rest, you're simply being pedantic, and you got upset when you were called on it.
          • The statement itself isn't what's legally binding. Unless explicilty stated otherwise via assignment to the public domain, copyright protection for produced works (such as photographs) is automatic in the United States

            As the position and speed limit of a speed sign is not an artistic expression, your post actually support what I wrote.

            • You're absolutely wrong. Please look up some actual caselaw before continuing to demonstrate your ignorance. I'd invest 15 minutes of my life doing this for you, since you're apparently incapable of doing it for yourself or you presumably would have already done so, but at this point it seems I'm wasting more of my life than is justified by even replying to your post. HAND.
              • As you seem to have superior knowledge:
                How can the position of a speed sign be copyrighted? How is the position of a speed sign a "produced work"? What is not a fact about it? So far nobody have claimed that a fact can be copyrighted, so is your claim that a fact can be copyrighted?
  • by Rogerborg ( 306625 ) on Monday October 22, 2012 @03:51AM (#41726465) Homepage
    Gets a thousand years of bad karma.
  • I've been working with phone directories for a few decades, where many companies are in basically the same position that you are - making a living from public information. Most data is collected from phone companies that dump their customer databases to the phone directory companies. This process and the associated tariffs are regulated by law. This data must be processed and cleaned up before it is passed on. Then there are data consumers - in the old days these were people reading the phone books. These d
  • What is the point? If you're too blind to read fucking traffic signs, how about not driving?
    • Extra alerts at the right times are useful.

      I've seen signs disappear because someone ran over them.

      I've seen signs disappear because kids stole them.

      I've seen trees grow around signs and obstruct the view of the signs.

      These are DoT issues that should resolved ASAP, but until then it might be useful to know that the 45mph limit dropped to 25 suddenly due to being very near a school for the blind and oh, by the way, the untrimmed bushes grew over the sign. No locals bother to report it because they know the

  • Just like the GPL but also closes the loophole that allows you to use an open source tool in SAS without giving back. I would investigate this licence. Also, map maker usually put distinctive voluntary mistakes in their maps to prove when data has been copied.
    • So basically what you want to do is cause someone else to start a competing site so they can use the data in thier own way without being bound by you're silly selfishness?

      Putting up barriers to getting the data is a good way to become irrelevant.

      YOU WANT COMMERCIAL USE. Commercial users (some, not all) will commit changes back to the main site just so they don't have to maintain their own distinct database for that purpose. Ask OSM.

      If you disallow others, they'll just start their own collection systems.

  • You could fix that first.

  • Why would anyone want to know the actual location of speed signs? Normally people want to know the speed of a particular road at a particular place. We already have a fairly popular version of that that in a Wiki form:

    OpenStreetMap []

    • There are a number of particular reasons. One I can think off the top of my head, at least in the state I live in, is that speed limit signs have to be displayed with in the rules of state law. If the sign is hidden by trees, to low to the ground, etc.. you can get the ticket dismissed in court. Also it can be used in defense when a cop tickets you incorrectly, 60 in a 55, when it actually is a 60.

    • That was my thought too... What the heck is the point of this exercise? Especially since you're supposed to be paying attention to the road anyhow.

    • I wanted exactly this so that I could build some kind of app to let me know roughly what the speed limit of the road I am currently driving is, or at the very least, the speed limit sign I most recently passed. I found "Wikispeedia" as the only one really doing it with any degree of success, but the website is absolute garbage, the demos rarely work, and the API is a bleeding joke. There's not a whole lot to "protect" here. Increase quality (both website and data), then worry about whether or not your da

"So why don't you make like a tree, and get outta here." -- Biff in "Back to the Future"