Integrating Wikipedia With a Local Intranet Wiki 121
An anonymous reader writes "I work for a large company taking a preliminary look at developing an honest-to-goodness wiki. We have tried to launch a company-wide wiki before, but with little success. The technical domains of each part of the company are different, thus each article needs a good deal of background to be useful. Of course, due the proprietary nature of our work we cannot share our articles outside of the intranet. What we would like to do is leverage existing wikis by augmenting our internal wiki with an external wiki. When a user accesses Wikipedia from inside our intranet, they receive the wikipedia content, plus the local domain specific information. For example, links to company-specific wiki pages would be available in Wikipedia pages. Has anyone else tried to do something like this? I know it sounds like a logistical nightmare; are there any thoughts on how to make this successful?"
URLs (Score:2, Funny)
URLs. Look into it.
Re: (Score:3, Insightful)
Noise.
It's a good place to bury the signal.
Re: (Score:1, Offtopic)
Hahhah, 100% accurate, and nicely put :D
Re: (Score:3, Interesting)
Said in a crude way; but to the OP: This guy is right. The most brain-dead simple way to make this work is to just set up your own wiki, and pepper it liberally with links to relevant Wikipedia pages. As someone below points out, there's even a feature in MediaWiki to make this linking easier (look up "InterWiki" in the MediaWiki help).
You may even be able to set up #REDIRECTS using InterWiki links so that people can still see the page names you want in your search and category listing, and then be taken
Re: (Score:2)
If you want a simpler solution and have a few tens of GBs of space to spare, then you can just download a snapshot of Wikipedia and use that as the base for your wiki. You won't get any future articles, but you'll get the current ones.
On the other hand, I don't really see the point. Is it really hard to read both the wikipedia page and the local page?
cron + rsync + tar (Score:2)
Every organization needs their own, up to date version of . [wikipedia.org]
But seriously, process the SQL dump when you retreive a monthly (quarterly?) update. Generate a set of strings that are relevant to your organization, and strip articles that don't match.
Someone can always visit the upstream site, or you can use the interwiki facilities, as mentioned elsewhere.
Re: (Score:2)
Re: (Score:2)
You could probably pretty easily write an extension for mediawiki that attaches to the 'ArticleAfterFetchContent' hook and augments the page with content fetched on the fly from Wikipedia. That would be easy enough to do. Just make sure that when the user is editing the page, the function you attach to the hook does not activate (otherwise you will end up saving the wikipedia content into your page, and it will be there twice when a user visits the page).
Re: (Score:2)
Said in a crude way; but to the OP: This guy is right
Agree with S77IM in whole. I've put together several Wikis for corporate use. URL's are magic. Aggregators aren't quite that simple, and the ones we tend to see from casual Google searches are almost universally held in contempt. Don't go there.
The company I work for settled on Confluence because we insisted on attribution and integration with our global AD (by "Global" I mean "about 40 countries"). It isn't all that bad. Stylistically and for tracking I prefer Wikimedia, but in an engineering and SI f
bad idea (Score:5, Interesting)
Re:bad idea (Score:5, Informative)
https://addons.mozilla.org/en-US/firefox/addon/748 [mozilla.org]
Re: (Score:2)
I wholeheartedly agree with the parent. Your best bet at doing this well is doing this as dynamically as possible. Scraping web pages is a huge pain. Building an extension to detect when you're visising wikipedia and inject something into the page is a hell of a lot simpler.
Another poster suggested greasemonkey. I haven't used it myself, but I suspect it would make sense to develop a prototype with greasemonkey first. It might well be that a custom extension is not needed at all.
Also, Firebug is your friend
Re: (Score:2)
A well written Javascript Bookmarklet will do the job too. You likely don't even need Greasemonkey, and it can be made cross-browser
Re: (Score:2)
I know of at least two ways this could be done, neither of which is nearly as much work as this would seem at first. First, did you know that the entirety of the content of Wikipedia is downloadable, in different formats? You can get everything, or just the current articles without the history (much smaller), and there are other options as well. While there is a lot of data, it is really not th
Re: (Score:2)
If we're talking about redirects, it would be quite easy to generate a 404 page that would redirect you to the Wikipedia page, either through a link or as a straight redirect. Or, if you can, use .htaccess and set up redirect rules there (it's the way wikipedia works anyway AFAIK, it just means adding more rules to your existing one)
Re: (Score:2)
Re: (Score:1)
Re: (Score:1)
Re:bad idea (Score:4, Funny)
if
you didnt read like
e.e. cummings
Re: (Score:1)
Here's a detailed howto on creating a Bot [wikipedia.org].
Re: (Score:1)
Also, Firebug is your friend.
Have you just woken up from cryostasis*, too?
And you know the craziest thing I've heard about Firebug? Allegedly, people also use it to debug web applications written in JavaScript! Applications... On the web... In JavaScript...
What's next? Apple running on Intel? Bill Gates becoming a humanitarian?
----
*) I was frozen just before GWB started WWIII and thawed after the Blacks won :D
**) Thank you, thank you. I'll be here all night.
Download it (Score:2, Informative)
http://en.wikipedia.org/wiki/Wikipedia_database
Download their database, put it into your system, and you're set.
Re: (Score:1)
Solution (Score:5, Informative)
Perhaps the easiest thing to do would be start with a complete dump of Wikipedia and add your own stuff to it. Their database dump page is here [wikipedia.org].
It is 2.8TB, however. They allude to a "Wikipedia API" for working on a "random subset" of Wikipedia; maybe that would be helpful too.
Re: (Score:2, Informative)
Why use a dump from early last year when you can have yesterdays (http://download.wikimedia.org/enwiki/latest/)?
Re: (Score:2, Funny)
Have you *seen* the latest?
I'd much rather have something that's been vetted a couple
YOU'RE A FAG LOL
Re: (Score:1, Insightful)
Your Karma must be shit BadAnalogyGuy.
Why would anyone one commit be less vetted than any other commit? The old commits don't get new edits merged into them. A commit from year ago is no less likely to have vandalism present than the commit from yesterday. It will just be different vandalism.
WHOOOOOOOOOSH (Score:1)
Re: (Score:2)
That's the compressed version. The meta-history file (compressed:17GB) decompresses to 2.8TB on its own. Assuming the same compression ratio (likely not a valid assumption) the articles file would decompress to 500GB, give or take.
Re: (Score:2)
> It's not 2TB, it's only 3.2gb. You need enwiki-20080103-pages-articles.xml.bz2,
> from http://www.archive.org/details/enwiki-20080103 [archive.org]
i recall reading somewhere the unzipped size of wikipedia was 1-2 TB... not sure about this file though
Re:Solution (Score:5, Interesting)
Dumps go stale, Wikipedia is updated all the time. I'd suggest something a bit more dynamic.
I did something similar (conceptually) as a dynamic help system for our web-based application, and had content in a wiki based on the URL of the page where the help message was to apply. In my case, clicking the "help" button on a page would make a proxy call to a private wiki to get the help menu content. If none was found, an email was sent to support desk and the end-user was given a web-chat prompt to tech support (with the URL prepended so that tech support could jump in, answer the questions, and write the help menu in one fell swoop)
In your case, start with your local wiki. Presumably you have some stuff in there already. Rename the articles as necessary to match URLs from Wikipedia.
Then, build a simple proxy server that rewrites wikipedia content to include a header of your local content. Probably 100 lines (or so) of glue code, and anywhere from a few man-hours to a few man-days coding.
The rest is all training.
Re: (Score:1)
Dumps go stale
what about periodic dumps of an LTS wikipedia release.
Re: (Score:2)
I guess I don't see the problem... I'm a lousy programmer, and I could work up a proof-of-concept for this in about 10 minutes in PHP. Put your internal information in one frame and the Wikipedia information in another. Simply load the data from Wikipedia, either using the Wikipedia API, or just put the wikipedia page in a frame. This is not really any different than about a million other websites that aggregate information from multiple sources. This doesn't actually integrate the data (for example inserti
IFRAME? Intelligent proxy/page modification? (Score:3, Insightful)
Re: (Score:2)
If you want to get fancy, use AJAX to grab the Wikipedia content, stuff it into a hidden div, then DOM select the contents of the article and set a visble div's html to the wiki content:
[code]
var wikiSource = JQuery.get("http://wikipedia.com/somearticle/", function (wikiHtml){ setContent(wikiHtml); })
function setContent(wikiHtml){
JQuery("#hiddenDiv.html(wikiHtml);
var wikiContent = JQuery("#hiddenDiv #content").html();
JQuery("#visibleDiv").html(wikiContent);
}
[/code]
Re: (Score:2)
Except you can't currently make off-domain AJAX calls. It's blocked for security reasons. There's a proposed standard for whitelisting domains, but it doesn't appear to be implemented in any browsers yet.
Re: (Score:2)
And writing a 3-line AJAX proxy script is too difficult ?
CURL page
strip garbage
output to client
How hard was that ?
Business Talk is Stupid Talk (Score:4, Insightful)
"What we would like to do is leverage existing wikis by augmenting our internal wiki with an external wiki"
What does that even mean? If you want to design something, you'll have to use more precise language. And for god's sake, stop using the word leverage without thinking about it. You used it backwards - if you are augmenting your internal wiki with external wikis, you are leveraging your internal wiki with the external wikis. You leverage a boulder with a lever, but you don't leverage a lever with a boulder.
Re:Business Talk is Stupid Talk (Score:5, Funny)
leverage (v.) -> opkrikken -> fuck up
augment -> duurder maken -> make more expensive
internal wiki -> krabbel zonder net -> off-line blurb
external wiki -> krabbel met net -> on-line blurb
existing -> nog bestaand -> not yet deleted
So the English to English translation is: "What we would like to do is fuck up non yet deleted blurbs by making our off-line blurbs more expensive with on-line blurbs".
Now that I can understand.
Re: (Score:2)
"What we would like to do is leverage existing wikis by augmenting our internal wiki with an external wiki"
What does that even mean? If you want to design something, you'll have to use more precise language.
His example is much clearer:
For example, links to company-specific wiki pages would be available in Wikipedia pages.
One solution could be a Firefox greasemonkey script, as someone above already suggested.
Re: (Score:2)
Re: (Score:2)
give him some slack, he's been in meetings all week with PHB's and Executives that throw the terms around like candy and they don't even know what it means.
"It will bring us a whole new dynamic by leveraging our skill-set when applied to the future latitude and positions."
Everyone knows the suits in the corner offices talk only to hear themselves talk. It's either that or Business Administration degrees have a "ramble on like you are educated" class requirement.
Re: (Score:2)
You leverage a boulder with a lever, but you don't leverage a lever with a boulder.
Actually I lever [cambridge.org] the boulder with a lever. Where I come from, "leverage" is a noun, to which the corresponding verb is "lever" :-)
Browser overlay (Score:2)
It seems to me I've seen a browser extension somewhere that lets users add their own comments to any arbitrary web page, and those comments can be made public so anyone else running the same browser extension will see them when they load the same page. I bet you could use something like that, with all your users having a browser plugin that pulls URL-based content from an internal server.
Friendly MITM attack (Score:2)
Sounds like a weird setup, so you'll probably need to do most of it yourself. Perhaps the easiest way is
1) setup a normal local wiki, with care to name pages the same as the relevant wikipedia page [I'm guessing you know how to do this]
2) use DNS redirects or similar tricks to get all wikipedia requests to go to a proxy
3a) do html injection on the page and stick your stuff at the bottom [MITM attack using ettercap or something like that]. This is probably a pretty bad solution, but is going to be the easies
Re: (Score:1)
Re: (Score:2)
That wouldn't work so well, if it were time to update from Wikipedia. I would assume they'd update frequently from Wikipedia (say once every month or so), but is it really necessary to suck down their whole database, when in reality if it's a small network (say less than 10,000 users), there will only be a handful of pages read.
Ah, what happened to the good ol' days, when the whole Internet fit on that one AOL disk. :)
Doinitwrong (Score:5, Insightful)
Agreed. Appending to wikipedia is the ass backwards way to do it. Everyone suggesting greasemonkey and other addons are just enabling your backassery.
What you do is create an internal wiki, and wherever relevent you link to the wikipedia article. Or an external doc. Or nothing at all and expect your employees to look it up on their own.
Re: (Score:2)
That's what I'd have suggested as well. Least amount of work, efficient, usable, no questionable hacks. It's common sense.
Re: (Score:2)
Sounds like someone never heard of 'target="new"' to force the external link to open in a new tab so that the user doesn't go "where did my f*ing internal wiki page go to?"
Re: (Score:2)
which explains why they never succeeded before - dumb users, and dump "implementors", and not even the basic understanding of how things work.
Welcome to corporate America. Like what we did to the economy?
Re: (Score:2)
Welcome to corporate America. Like what we did to the economy?
Should have gone with Art Deco. The whole "Early Mongolian Clusterfuck" theme clashes.
interwiki (Score:5, Interesting)
google wave (Score:1, Offtopic)
watch 1.20hs here and see for yourself [google.com]. This monster will change email, chat, wikis and forums. I'd be worried if I was a slashdot overlord. In fact, an idea for an extension to google wave would be to implement slashdot's moderation system into it.
Maybe I drank too much of the kool-aid, but I think wikis and forums will all have to rapidly adapt, or adopt the co
Re: (Score:2)
Don't (Score:5, Interesting)
Re: (Score:1, Informative)
sheer
Re: (Score:3, Interesting)
I agree (would mod up but gave up modding way back). However this is an interesting and probably reoccurring problem: extending the wealth of public net wisdom with precision data from local context (organisational or task-centric rather than geolocational).
A proxy adding local content into pages loaded from outside as suggested in Re:Solution by mcrbids [slashdot.org] would solve some of the problems you mention:
* The wikipedia content will always be out of date
Re: (Score:1)
Also, strictly speaking, what the poster wants to do is illegal according to the CC-BY-SA and the GFDL.
See http://en.wikipedia.org/wiki/Wikipedia:License#Re-use_of_text [wikipedia.org]
I'm not sure he's planning on modifying, but it still sounds like a pretty clear-cut copyright violation.
Re: (Score:2)
If the modified version is only used in-house then it not made available to the public so clauses about redistribution do not apply.
Hyperlinks? (Score:1)
Maybe I'm missing something, but why not just have an external links section on your internal wiki, or a "Required Reading" section? Seems like the solution you're proposing is a little bit heavyweight for the described problem.
Legitimate use for this hack (Score:2, Interesting)
Am I the only one which cannot see any legitimate uses for this hack.
Why lure your users into thinking the content is on wikipedia if it is on your network?
Can't your users use wikipedia _and_ your wiki.
Sincerely I think that the goal for this hack is luring users to think they're reading/editing wikipedia for someone's profit.
Re: (Score:2)
Obvious answer: If they're as retarded as the person posting the question ...
Seriously, if the user can't figure out how to open 2 sites in 2 tabs, a "merged wiki" should be low on you list of priorities.
your content, how proprietory is it? (Score:1)
Maybe... (Score:2)
Open page in intranet for...say, capcitor.
Script grabs wikipedia article, strips out header, sidebar, etc and fill in remaining links/images with proper URLs to wikipedia (so they work)
Stores in a database for diff'ing and updating later, dumps remaining content from Wikipedia at the bottom with a good 'ol <hr> and you're off!
What? (Score:3, Interesting)
Why? Can't you just link to wikipedia pages where appropriate? OK, my company has an internal server we link through to sanitize referrer info so our internal wiki titles don't get all over teh interwebs. But if the wiki users can't figure out "hey, this article is too specific - maybe wikipedia has more general information that would help me," you've got bigger problems than your wiki management.
part of an Intelligent Book (Score:4, Interesting)
A very small part of My PhD [cam.ac.uk] looked at this (but with "collaborative textbooks" rather than wikis) -- see Chapter 4. Adding a very simple metadata-based navigation layer over the top of the wiki is pretty easy, clean (doesn't confuse users), and seems to do the trick. The wiki itself shows in an embedded frame. Of course, I had to go further and let students do difficult number theory proofs backed by machine reasoning systems within the book, but you won't have to solve that problem!
I'm (gradually) putting this fairly simple but useful part of the software into an online resource at www.theintelligentbook.com [theintelligentbook.com], though it's in my spare time and the system is down at the moment. I'll put my contact details back up there shortly in case the question-asker wants to discuss it technically.
Simple. Two Tabs (Score:2)
One Tab for your Internal Wiki. Another one for wikipedia.
You can also highlight a particular word in your internal wiki, do a right click and search wikipedia (if your search is set so). The search term automatically open the wikipedia content in a new tab. How amazing. Isn't it?
Is it only me wondering how did this article ever made it to /. ?
Learn from mistakes. (Score:1)
Not without merit. (Score:1)
This is something the Google Wave protocol and platform [youtube.com] completely anticipates.
Its based on a tree structure and source code management. People who edit from the synergized wiki could add to either the private or public versions, and patches to public versions or additional documents could be changed and maintained internally.
Re: (Score:1)
That would essentially be the way it would happen. You would hot pull down the mediawiki source, apply local changes, and locally render to pages with active diffs. You would add have pages that only exist locally. Due to limitations in the platform you would have to custom design any way to have changes that people make go either to public or private system, this would be difficult under the current system constraints, where the documents structure is not kept track of.
The simplest solution I can think of is.. (Score:1)
1. On your personal wiki server, have a copy of each page of the wikipedia you want to apply modifications to, and add whatever you want on those.
2. Have a modified http proxy on the intranet that detects queries to the wikipedia about items that you have on the server and re-route them.
For example, let's say you want custom information on http://en.wikipedia.org/wiki/Socks [wikipedia.org]. You copy it to http//yourintranetserver/wiki/Socks, and make your changes.
Then, if someone from inside your network tries to g
Freebase.com (Score:1)
They provide an API to obtain articles and structured data from them. They handle all of the wikipedia import.
Additionally, you can do much more with the structured data there
For instance - Olympic Cyclists and the Way They Died.
http://www.freebase.com/view/user/doconnor/default_domain/views/olympic_cyclists_and_they_way_they_died [freebase.com] Try doing that with Wikipedia.
Done this before (Score:1)
1) Install Wikipedia software locally and use this for any locally created articles
2) The web server running this simply proxies out to en.wikipedia.org for that request if not available in the local version. The easiest way to do this is with Apache + rewrite rules
This means that users can get to articles locally and on wikipedia from the same command
You then need to consider the following
1) The search request needs to go to the local version of wikipedia then the external one and concatinate the results t
Why not (Score:1)
howabout about.com? (Score:1)
they were pretty good at page-hijacking, IIRC :-)
seriously though, perhaps i mis-read the question? are you looking for automated tools to do the hyper-links?
There's lots of value in a compound wiki (Score:4, Interesting)
Ignore the nay sayers. Of course there is a lot of value in aggregating content and creating a compound page that blends your internal content with other sources.
From a usuability and authority-of-source perspective, however, I think it would be best to list each source in a separate section on the page, starting with your internal content at the top. You can get to the other content either by embedding links into your internal content, or by collecting the links in a separate section.
Wikipedia itself uses the embedded technique. When composing or editing an article, the author can embed markup for external references. On display, this markup is turned into a footnote link at the point of embedding, and a footnote at the bottom of the page. I don't see why you couldn't do something similar. In this case, however, you would be embedding references to Wikipedia articles.
I don't see why you couldn't do something similar. In your internal wiki templates, have a custom markup for embedding wikipedia queries related to the article. On display, turn this markup queries either into embedded links to footnotes, resolve the queries and deposit them at the bottom of the page, or toss them into iframes and let the user sort it out.
The other technique is to have a custom form in your internal wiki template where you collect the cross-references. On display, turn these queries into links or resolve them into content.
In any event, why limit yourself to Wikipedia? Include cross-references to patent search engines and other domain-specific sources.
A big word of caution, of course, is owed to the legal angle. Make sure you follow the law whenever reusing anyone else's content, even if it's just a link. Have your legal department sign off on your reuse policy. Don't distract them with technical aspects of what you want to do. They're lawyers; they only care about the law. Ask them a specific legal question, such as, "what is our legal exposure if we republish (links to or actual content from) Wikipedia on our internal wiki?".
WikiSlurp... (Score:1)
Is this really worthwhile? (Score:1)
I don't get it. Are people in your company using Wikipedia so much in their daily work that this would really be useful. Just set up your internal wiki. It is your focal point. Why try and integrate the two beyond just making a link to Wikipedia? Using Mediawiki, you can even use Interwiki links to easily link outside of your internal wiki.
Try the other way round (Score:1)
Why not try the other way round:
Create your wiki, add pages, add links from your wiki pages (which you have full control over) to relevant wikipedia pages?
Much simpler, and should still produce the desired effect.
The Real issue is Social. (Score:2)
You are trying to force a technical solution on a social problem. It's probably not going to work. Your best bet for success is to try and install a WYSIWYG editor for mediawiki. There are several out there. wiki, underneath, is just a programming language. It requires training people - no matter how much it is designed to be "easy." Make it easier.
Consider Sharepoint. As much as /. is Anti-Microsoft, if your users are used to Exchange and Windows then Sharepoint is worth paying for.
I've worked for Larry Sa
Extensions (Score:2)
Tearline Wiki (Score:2)
The experimental Tearline Wiki [galois.com] system we've developed at Galois [galois.com] might suit your needs. Inside the firewall, you use MediaWiki with the Tearline system, and get a combined view of your internal wiki(s), possibly different wikis on different sub-nets, and you can integrate it with Wikipedia or other internet-based wikis to get the global context of the article.
As others have said, integrating your content with other people's content can be a legal issue.
Contact me if you want more information on Tearline :)
pe
Wikipedia is X rated (Score:1)
I work for a huge corporation and we have our own thing called etipedia.
Also, don't forget, wikipedia is X rated. [wikipedia.org]
MediaWiki interwiki links (Score:2)
Use interwiki links. I use them to link our intranet, mediawiki, our external developer wiki, and our external support wiki.
You will probably be unable to use them since using them requires the ability to get off your lazy ass and read the MediaWiki documentation or google for it, which results in plenty of information.
Also the fact that you're going to have to be able to insert a row in a database is probably going to be over your head.
READ THE DOCUMENTATION YOU LAZY FUCK.
Squid Proxy (Score:1)
IFrame + JavaScript = robust and simple (Score:2)
Semantic MediaWiki and SMW+ (Score:1)
You may want to check the Semantic MediaWiki (semantic-mediawiki.org) or SMW+ (wiki.ontoprise.de).
Both are built on top of MediaWiki (which powers Wikipedia) so you can tap the very rich pools of extensions (numbering in the hundreds).
SMW+ is actually built on top of SMW, and it focuses on increasing usability and it preinstall pre-configured extensions out of the box to make it easier to deploy.
With SMW/SMW+, you can put in semantic annotations for an article describing just about anything you want to ass
seems like a very easy hack (Score:2)
unless i am completely misunderstanding you, this seems like a pretty easy hack on any wiki engine. just query the page's title at other wikis and append the content to the bottom. for example: you have a page called Server Farm -- detailing your companies server farm. whenever that page is loaded in a browser, the dynamic content generator in the website downloads the page with the same name from wikipedia, strips out their formatting, and sticks it at the bottom of your page. your users can only edit y