Where's the Open Data? 56
blamanj asks: "There's a lot of open-source code around, and generally, it's quite easy to find. Finding open source data, on the other hand, can be quite a pain. Why isn't there a common reponsitory for public domain data sets? I'm thinking of things like lists of world cities, dictionaries of stemmed words, population data, etc., etc."
Your Tax Dollars At Work (Score:5, Informative)
Soon you'll be able to try... (Score:4, Informative)
Grabbe the link off of rootprompt [rootprompt.org] in case any of you care
NIMA and NOAA too (Score:5, Informative)
NOAA [http] provides Bathymetry data [noaa.gov] and electronic navigation charts (vectorized) [noaa.gov] and NIMA (that's right, .mil, -- NIMA used to be the Defense Mapping Agency [nima.mil] provides city lists and populations for all the countries in the world, as well as DEMs (digital elevation models--i.e. gridded topography). The National Atlas project [nationalatlas.gov] provides boundaries of federal lands, outlines of states, locations of major cities, stuff like that.
ENJOY!
CIA's World fact book (Score:2, Informative)
How 'common' do you need it? (Score:5, Funny)
There is, it's right here. [google.com]
(aka The Internet)
Re:How 'common' do you need it? (Score:1, Funny)
Re:How 'common' do you need it? (Score:3, Troll)
Unfortunately, most of the Ask Slashdot's are so lame they can be answered with a simple google search.
The editor that posts the Ask Slashdot should first see if he can easily answer the question with a google search before posting the article.
Re:How 'common' do you need it? (Score:4, Insightful)
But someone submitting a question to Ask Slashdot doesn't want a bunch of links from Google, they want opinions...opinions from real people that may or may not (most likely) know what they're talking about.
They want discussion.....you cant get that by searching on google...
Re:How 'common' do you need it? (Score:2)
Yes, you can! [google.com]
Have you tried (Score:2)
Re:Have you tried (Score:1)
> Try the public library sometime.
For many people that is usually not a valid option. Public libraries are, too often, one of the first "luxuries" a government cuts back on, and are rendered not useful by sheer neglect.
Missing the point (Score:1, Interesting)
SQL Server and Access both come with the Northwind database. If I have some new query that I'm trying to write, for instance randomly returning different numbers of products for each product category, it is pretty darn handy to have a standardized data set to pull from for my example code.
Otherwise, I have to include DDL and DML just to create the example data. Instead, I can just say "Run this against Northwind."
The same applies for training and learning. Northwind is a pretty well known database, and most established developers won't have to learn a new schema in order to demonstrate a new concept.
So rephrase the question from "Where can I find some data?" to "Where can I find a data set that other developers are using so we can more intelligently exchange information?"
Re:Missing the point (Score:1)
Way offtopic? (Score:1)
I'm sorry I've been around for a while and don't Karma Whore with everyone else. I post my opinion. Always. I don't cater to what I think will get posted up or down.
Use your points to mod intersting stuff up instead of wasting them by modding down stuff you don't agree with. Christ, there were how many trolls that weren't touched when my comment got modded down? My comment was the best one that could be found to be modded down?
This comment will be my third to get moderated down, and in the same thread no less. There are far more deserving comments that should have had the point used to be modded up.
Yeah yeah, YHBT, YHL, HAND, but I don't really care.
here ya go (Score:5, Informative)
Re:here ya go (Score:1)
Sorry, I'm having trouble visualising that.
Amen (Score:5, Informative)
Re:Amen (Score:1)
Re:Amen (Score:2)
A little surprised . . . (Score:3, Informative)
Re:A little surprised . . . (Score:1)
Open NASA data (Score:2)
Oh, yeah... just remembered a nice bookmark!
http://earthobservatory.nasa.gov/Newsroom/NewImag
The NASA Earth Observatory. Don't know how open, though.
Where's the financial data?? (Score:4, Interesting)
This is stuff you can't download for free from Yahoo, CBOE, or other places.
If I can just get access to this data, then I will make enough money to purchase the other data.
Re:Where's the financial data?? (Score:1, Interesting)
US Company Security Filings:
http://www.sec.gov/edgar.shtml
Historical SEC findings in XML format:
http://bulk.resource.org/edgar/
Limited stock prices (15-30 years) are available from yahoo.
What I'm looking for is historical stockbuyback lists.
Boardgame/Parlor game data? (Score:3, Interesting)
For example, ever run out of trivia questions in your version of Trivial Pursuite? Or used up all the word cards in Taboo... etc etc.
I think in the event of running out of data for your board game, it would be nice to download more. (And this would make a cool website.)
Especially if they came with PalmPilot/Windows versions that would administer the game for you. For example Taboo consists of a word that you must get the other people to say, but there are 7 words that you CANNOT say as a clue. For example the word may be "George Bush" and you can't say "Texan", "President", etc. This game is fun but we calculate we'll use up all the data that comes with it in about 30 hours. The "electronic" version is $40. That's hardly worth it. If we could just download data, we could play forever. So .. um .. yes.. I want open data.
Re:Boardgame/Parlor game data? (Score:1)
I think in the event of running out of data for your board game, it would be nice to download more. (And this would make a cool website.)
You know, I came up with this idea just two days ago, after discovering how lame the new questions are in the Trivial Pursuit 20th Aniversary Edition. It's really time these questions got open sourced.
What I had in mind was a system whereby people could go to a website and contribute trivia questions. After some human screening, the questions would go into a database, which would then be used to generate PDF files of Trivial Pursuit Cards that could be freeley downloaded and printed at the users home computer. Alternately, users could pay 10 bucks to have a deck of pre-printed Trivial Pursuit cards sent their house.
I'm interested in starting this project over at Sourceforge. Anyone else like the idea?
trivial matter (Score:2)
I guess if you wanted to get it in some sort of doc form you could use a session log and tweak it.
Re:trivial matter (Score:1)
Re:trivial matter (Score:2)
Here's an example of where the logging function is found in x-chat, the client I use under linux, the path is > Settings > Setup > Options, you'll see the logging function then.
Now licensing ya got me, ask permission if it's not yours is the best bet. I can't see folks getting real anal over it, either, but ya never know. Probably just depends what you intend to do with it, share it around, they'll probably say have at it, try to develop a commercial product and sell it, you need more serious advice and a contract I guess. Like "yo, zeke, mind if I take this logfile and tweak it and come up with a nice page of trivial pursuit questions?" "Sure man goferit, send me the url when you finished I want a copy" "thanks" "swell". Beyond that your gonna need a contract of some sort. Cash changes reality. Asking permission is always the safest bet, IMO.
Re:trivial matter (Score:1)
Re:Boardgame/Parlor game data? (Score:1)
The idea would be to support as many different games as possible...
Re:Boardgame/Parlor game data? (Score:1)
Re:Boardgame/Parlor game data? (Score:2)
I'd recommend converting the whole thing into XML with fields for:
A long time ago, there used to be an IRC based game that was a run like Jeopardy [google.com]. Don't know if the games were archived though.
Found this on the web (Score:1)
Much as you suggested, it uses XML to allow people leeway on structuring a quiz. It doesn't offer multiple choices, though. Thats not really a concern for the Trivial Pursuit application, but still, I'd like to have it as an option.
It should be clearly labeled (Score:4, Insightful)
When you go to Google to find software to fill some specific need, you already know quite clearly how to search. The problem with finding "open data" is that there currently is not any commonly used clear label on such texts, research and articles. I tend to mention that the content is released under the GNU Free Documentation License [gnu.org] or FDL when I want to release something to be freely utlized by anyone. One such case is for example the Amazon Discoveries [cyberian.org] series. Not that it would be any useful for anyone :) This problem is a bit related to the problem of releasing your idea or concept under such license - there does not seem any clear practise how to go on about this :: what to do if your idea might be unique but you do not want to patent it. We have that exact problem with for example the Openchallenge concept submissions [openchallenge.org]. Any ideas on what practises to use in that case would help us out.
Timelines and the 'Necessary Web' (Score:4, Interesting)
I agree in theory that we need a Semantic Web where content is easier to find, but I don't think XML-etc can really help. [rant] [robotwisdom.com]
My current theory is that individuals need to build the 'Necessary Web' which consists, like an encyclopedia, of a page for each topic (or many pages by different authors, on their own websites). Four special traits make a page qualify as 'Necessary':
-- an attempt to be FAQ-like, and briefly cover all the important subtopics on a single page.
-- an attempt to sort thru and link all the best web-resources on the topic. (By reducing the linktext to one- or two-word [text buttons] [robotwisdom.com] you can fit hundreds of links into a useful page.)
-- a timeline, to present the most possible data in the neatest possible way. [theory] [robotwisdom.com]
-- The Open Web Content License [robotwisdom.com] to encourage others to recycle-and-update your content, requiring only that they clearly link your page as one of the original sources.
Most recent example of this format: Linux/Unix [robotwisdom.com] (timeline w/100s of links)
I believe that once a critical mass of authors adopt this format, taking on the most useful topics, there will be a rapid shift from the current search-frustrations to something very much like the Semantic-Web ideal, without even requiring any fancier technology than simple HTML.
not an easily distributed task (Score:3, Insightful)
ibiblio (Score:2)
Surprise. No one as yet seems to have mentioned ibiblio? [ibiblio.org]
Electronic music databases (Score:2, Interesting)
Check out and add to:
Baseball (Score:2)
-Gabe
Plenty of good data from the government (Score:1)
In addition, most state governments and even county level governments publish large amounts of data.
UCI Repositories (Score:2)
Bioinfomatics databases (Score:1)