Stories
Slash Boxes
Comments
typodupeerror delete not in

Comments: 121 +-   Integrating Wikipedia With a Local Intranet Wiki on Thursday July 16 2009, @01:40AM

Posted by samzenpus on Thursday July 16 2009, @01:40AM
from the mix-and-match dept.
internet
An anonymous reader writes "I work for a large company taking a preliminary look at developing an honest-to-goodness wiki. We have tried to launch a company-wide wiki before, but with little success. The technical domains of each part of the company are different, thus each article needs a good deal of background to be useful. Of course, due the proprietary nature of our work we cannot share our articles outside of the intranet. What we would like to do is leverage existing wikis by augmenting our internal wiki with an external wiki. When a user accesses Wikipedia from inside our intranet, they receive the wikipedia content, plus the local domain specific information. For example, links to company-specific wiki pages would be available in Wikipedia pages. Has anyone else tried to do something like this? I know it sounds like a logistical nightmare; are there any thoughts on how to make this successful?"
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • URLs (Score:2, Funny)

    URLs. Look into it.

    • Re: (Score:3, Insightful)

      by smallfries (601545)

      Noise.

      It's a good place to bury the signal.

    • Re: (Score:3, Interesting)

      by S77IM (1371931)

      Said in a crude way; but to the OP: This guy is right. The most brain-dead simple way to make this work is to just set up your own wiki, and pepper it liberally with links to relevant Wikipedia pages. As someone below points out, there's even a feature in MediaWiki to make this linking easier (look up "InterWiki" in the MediaWiki help).

      You may even be able to set up #REDIRECTS using InterWiki links so that people can still see the page names you want in your search and category listing, and then be taken

      • If you want a simpler solution and have a few tens of GBs of space to spare, then you can just download a snapshot of Wikipedia and use that as the base for your wiki. You won't get any future articles, but you'll get the current ones.

        On the other hand, I don't really see the point. Is it really hard to read both the wikipedia page and the local page?

        • Every organization needs their own, up to date version of . [wikipedia.org]

          But seriously, process the SQL dump when you retreive a monthly (quarterly?) update. Generate a set of strings that are relevant to your organization, and strip articles that don't match.

          Someone can always visit the upstream site, or you can use the interwiki facilities, as mentioned elsewhere.

  • bad idea (Score:5, Interesting)

    by uepuejq (1095319) on Thursday July 16 2009, @01:47AM (#28713343) Homepage
    create a firefox addon that downloads a master list of wikipedia urls to add a link to the intranet site to. you can use regular expressions to parse the wikipedia source so that your link is consistently placed. the master list can be updated at will, and could probably be filled the first time with a simple database request. or something.
    • Re:bad idea (Score:5, Informative)

      by jayminer (692836) on Thursday July 16 2009, @01:51AM (#28713361) Homepage
      Good idea. You can even use an existing add-on, Greasemonkey to do this.

      https://addons.mozilla.org/en-US/firefox/addon/748 [mozilla.org]
    • I wholeheartedly agree with the parent. Your best bet at doing this well is doing this as dynamically as possible. Scraping web pages is a huge pain. Building an extension to detect when you're visising wikipedia and inject something into the page is a hell of a lot simpler.

      Another poster suggested greasemonkey. I haven't used it myself, but I suspect it would make sense to develop a prototype with greasemonkey first. It might well be that a custom extension is not needed at all.

      Also, Firebug is your friend

      • A well written Javascript Bookmarklet will do the job too. You likely don't even need Greasemonkey, and it can be made cross-browser

      • Scraping web pages is not so bad... I have been doing it for years. But in this case it is entirely unnecessary.

        I know of at least two ways this could be done, neither of which is nearly as much work as this would seem at first. First, did you know that the entirety of the content of Wikipedia is downloadable, in different formats? You can get everything, or just the current articles without the history (much smaller), and there are other options as well. While there is a lot of data, it is really not th
        • If we're talking about redirects, it would be quite easy to generate a 404 page that would redirect you to the Wikipedia page, either through a link or as a straight redirect. Or, if you can, use .htaccess and set up redirect rules there (it's the way wikipedia works anyway AFAIK, it just means adding more rules to your existing one)

  • Download it (Score:2, Informative)

    by Anonymous Coward

    http://en.wikipedia.org/wiki/Wikipedia_database

    Download their database, put it into your system, and you're set.

  • Solution (Score:5, Informative)

    by Z34107 (925136) on Thursday July 16 2009, @01:57AM (#28713399)

    Perhaps the easiest thing to do would be start with a complete dump of Wikipedia and add your own stuff to it. Their database dump page is here [wikipedia.org].

    It is 2.8TB, however. They allude to a "Wikipedia API" for working on a "random subset" of Wikipedia; maybe that would be helpful too.

    • Re:Solution (Score:5, Interesting)

      by mcrbids (148650) on Thursday July 16 2009, @02:25AM (#28713563) Journal

      Dumps go stale, Wikipedia is updated all the time. I'd suggest something a bit more dynamic.

      I did something similar (conceptually) as a dynamic help system for our web-based application, and had content in a wiki based on the URL of the page where the help message was to apply. In my case, clicking the "help" button on a page would make a proxy call to a private wiki to get the help menu content. If none was found, an email was sent to support desk and the end-user was given a web-chat prompt to tech support (with the URL prepended so that tech support could jump in, answer the questions, and write the help menu in one fell swoop)

      In your case, start with your local wiki. Presumably you have some stuff in there already. Rename the articles as necessary to match URLs from Wikipedia.

      Then, build a simple proxy server that rewrites wikipedia content to include a header of your local content. Probably 100 lines (or so) of glue code, and anywhere from a few man-hours to a few man-days coding.

      The rest is all training.

      • Re: (Score:2, Informative)

        by negge (1392513)

        Why use a dump from early last year when you can have yesterdays (http://download.wikimedia.org/enwiki/latest/)?

      • That's the compressed version. The meta-history file (compressed:17GB) decompresses to 2.8TB on its own. Assuming the same compression ratio (likely not a valid assumption) the articles file would decompress to 500GB, give or take.

  • by seifried (12921) on Thursday July 16 2009, @02:05AM (#28713437)
    I assume you want up to date content and to have it clearly seperated from what is yours. Why not enclose the content within an IFRAME? Seriously, it's stupid and simple but might be all you need. Alternatively you coudl use some form of an intelligent proxy/page modifier, either as a mediawiki plugin or whatever floats your boat (i.e. every time a page is loaded also try to get the wikipedia stuff).
    • If you want to get fancy, use AJAX to grab the Wikipedia content, stuff it into a hidden div, then DOM select the contents of the article and set a visble div's html to the wiki content:

      [code]
      var wikiSource = JQuery.get("http://wikipedia.com/somearticle/", function (wikiHtml){ setContent(wikiHtml); })

      function setContent(wikiHtml){

      JQuery("#hiddenDiv.html(wikiHtml);
      var wikiContent = JQuery("#hiddenDiv #content").html();
      JQuery("#visibleDiv").html(wikiContent);

      }

      [/code]

      • Except you can't currently make off-domain AJAX calls. It's blocked for security reasons. There's a proposed standard for whitelisting domains, but it doesn't appear to be implemented in any browsers yet.

        • And writing a 3-line AJAX proxy script is too difficult ?

          CURL page
          strip garbage
          output to client

          How hard was that ?

  • by rm999 (775449) on Thursday July 16 2009, @02:11AM (#28713475)

    "What we would like to do is leverage existing wikis by augmenting our internal wiki with an external wiki"

    What does that even mean? If you want to design something, you'll have to use more precise language. And for god's sake, stop using the word leverage without thinking about it. You used it backwards - if you are augmenting your internal wiki with external wikis, you are leveraging your internal wiki with the external wikis. You leverage a boulder with a lever, but you don't leverage a lever with a boulder.

    • by MrMr (219533) on Thursday July 16 2009, @05:13AM (#28714379)
      As a non native speaker I find a dictionary quite convenient in these cases. so I'll do some back and forth translation for you here:

      leverage (v.) -> opkrikken -> fuck up
      augment -> duurder maken -> make more expensive
      internal wiki -> krabbel zonder net -> off-line blurb
      external wiki -> krabbel met net -> on-line blurb
      existing -> nog bestaand -> not yet deleted

      So the English to English translation is: "What we would like to do is fuck up non yet deleted blurbs by making our off-line blurbs more expensive with on-line blurbs".
      Now that I can understand.
    • "What we would like to do is leverage existing wikis by augmenting our internal wiki with an external wiki"

      What does that even mean? If you want to design something, you'll have to use more precise language.

      His example is much clearer:
      For example, links to company-specific wiki pages would be available in Wikipedia pages.

      One solution could be a Firefox greasemonkey script, as someone above already suggested.

    • I have no idea what he means, but when he ran it up the flagpole I saluted.
    • by Lumpy (12016)

      give him some slack, he's been in meetings all week with PHB's and Executives that throw the terms around like candy and they don't even know what it means.

      "It will bring us a whole new dynamic by leveraging our skill-set when applied to the future latitude and positions."

      Everyone knows the suits in the corner offices talk only to hear themselves talk. It's either that or Business Administration degrees have a "ramble on like you are educated" class requirement.

    • You leverage a boulder with a lever, but you don't leverage a lever with a boulder.

      Actually I lever [cambridge.org] the boulder with a lever. Where I come from, "leverage" is a noun, to which the corresponding verb is "lever" :-)

  • It seems to me I've seen a browser extension somewhere that lets users add their own comments to any arbitrary web page, and those comments can be made public so anyone else running the same browser extension will see them when they load the same page. I bet you could use something like that, with all your users having a browser plugin that pulls URL-based content from an internal server.

  • Sounds like a weird setup, so you'll probably need to do most of it yourself. Perhaps the easiest way is
    1) setup a normal local wiki, with care to name pages the same as the relevant wikipedia page [I'm guessing you know how to do this]
    2) use DNS redirects or similar tricks to get all wikipedia requests to go to a proxy
    3a) do html injection on the page and stick your stuff at the bottom [MITM attack using ettercap or something like that]. This is probably a pretty bad solution, but is going to be the easies

      •     That wouldn't work so well, if it were time to update from Wikipedia. I would assume they'd update frequently from Wikipedia (say once every month or so), but is it really necessary to suck down their whole database, when in reality if it's a small network (say less than 10,000 users), there will only be a handful of pages read.

            Ah, what happened to the good ol' days, when the whole Internet fit on that one AOL disk. :)

  • Doinitwrong (Score:5, Insightful)

    by Anonymous Coward on Thursday July 16 2009, @02:35AM (#28713613)

    Agreed. Appending to wikipedia is the ass backwards way to do it. Everyone suggesting greasemonkey and other addons are just enabling your backassery.

    What you do is create an internal wiki, and wherever relevent you link to the wikipedia article. Or an external doc. Or nothing at all and expect your employees to look it up on their own.

    • That's what I'd have suggested as well. Least amount of work, efficient, usable, no questionable hacks. It's common sense.

      • Of course it's common sense - which is why it won't be done that way. It's the "OMG you expect users to figure this out?" shit.

        Sounds like someone never heard of 'target="new"' to force the external link to open in a new tab so that the user doesn't go "where did my f*ing internal wiki page go to?"

        ... which explains why they never succeeded before - dumb users, and dump "implementors", and not even the basic understanding of how things work.

        • by Lumpy (12016)

          which explains why they never succeeded before - dumb users, and dump "implementors", and not even the basic understanding of how things work.

          Welcome to corporate America. Like what we did to the economy?

          • Welcome to corporate America. Like what we did to the economy?

            Should have gone with Art Deco. The whole "Early Mongolian Clusterfuck" theme clashes.

  • interwiki (Score:5, Interesting)

    by MadFarmAnimalz (460972) * on Thursday July 16 2009, @02:35AM (#28713619) Homepage
    You probably want interwiki [wikipedia.org].
  • Don't (Score:5, Interesting)

    by pfafrich (647460) <rich@singsurREDHATf.org minus distro> on Thursday July 16 2009, @02:43AM (#28713675) Homepage
    Merging wikipedia with you company wiki is a bad idea:
    • The wikipedia content will always be out of date
    • Changes made to wikipedia content don't get fed back into wikipedia
    • Creates confusion as to what is and is not company information
    • Trying to load the wikipeida DB locally is a headache due to its shear size
    • Re: (Score:3, Interesting)

      by korpique (807933)

      I agree (would mod up but gave up modding way back). However this is an interesting and probably reoccurring problem: extending the wealth of public net wisdom with precision data from local context (organisational or task-centric rather than geolocational).

      A proxy adding local content into pages loaded from outside as suggested in Re:Solution by mcrbids [slashdot.org] would solve some of the problems you mention:

      * The wikipedia content will always be out of date

  • Am I the only one which cannot see any legitimate uses for this hack.

    Why lure your users into thinking the content is on wikipedia if it is on your network?
    Can't your users use wikipedia _and_ your wiki.

    Sincerely I think that the goal for this hack is luring users to think they're reading/editing wikipedia for someone's profit.

    • Why lure your users into thinking the content is on wikipedia if it is on your network?
      Can't your users use wikipedia _and_ your wiki.

      Obvious answer: If they're as retarded as the person posting the question ...

      Seriously, if the user can't figure out how to open 2 sites in 2 tabs, a "merged wiki" should be low on you list of priorities.

  • Open page in intranet for...say, capcitor.

    Script grabs wikipedia article, strips out header, sidebar, etc and fill in remaining links/images with proper URLs to wikipedia (so they work)

    Stores in a database for diff'ing and updating later, dumps remaining content from Wikipedia at the bottom with a good 'ol <hr> and you're off!

  • What? (Score:3, Interesting)

    by madcow_ucsb (222054) <slashdot2.sanks@net> on Thursday July 16 2009, @03:04AM (#28713787)

    Why? Can't you just link to wikipedia pages where appropriate? OK, my company has an internal server we link through to sanitize referrer info so our internal wiki titles don't get all over teh interwebs. But if the wiki users can't figure out "hey, this article is too specific - maybe wikipedia has more general information that would help me," you've got bigger problems than your wiki management.

  • by williamhb (758070) on Thursday July 16 2009, @03:16AM (#28713837) Homepage Journal

    A very small part of My PhD [cam.ac.uk] looked at this (but with "collaborative textbooks" rather than wikis) -- see Chapter 4. Adding a very simple metadata-based navigation layer over the top of the wiki is pretty easy, clean (doesn't confuse users), and seems to do the trick. The wiki itself shows in an embedded frame. Of course, I had to go further and let students do difficult number theory proofs backed by machine reasoning systems within the book, but you won't have to solve that problem!

    I'm (gradually) putting this fairly simple but useful part of the software into an online resource at www.theintelligentbook.com [theintelligentbook.com], though it's in my spare time and the system is down at the moment. I'll put my contact details back up there shortly in case the question-asker wants to discuss it technically.

  • One Tab for your Internal Wiki. Another one for wikipedia.
    You can also highlight a particular word in your internal wiki, do a right click and search wikipedia (if your search is set so). The search term automatically open the wikipedia content in a new tab. How amazing. Isn't it?

    Is it only me wondering how did this article ever made it to /. ?

  • Ignore the nay sayers. Of course there is a lot of value in aggregating content and creating a compound page that blends your internal content with other sources.

    From a usuability and authority-of-source perspective, however, I think it would be best to list each source in a separate section on the page, starting with your internal content at the top. You can get to the other content either by embedding links into your internal content, or by collecting the links in a separate section.

    Wikipedia itself uses the embedded technique. When composing or editing an article, the author can embed markup for external references. On display, this markup is turned into a footnote link at the point of embedding, and a footnote at the bottom of the page. I don't see why you couldn't do something similar. In this case, however, you would be embedding references to Wikipedia articles.

    I don't see why you couldn't do something similar. In your internal wiki templates, have a custom markup for embedding wikipedia queries related to the article. On display, turn this markup queries either into embedded links to footnotes, resolve the queries and deposit them at the bottom of the page, or toss them into iframes and let the user sort it out.

    The other technique is to have a custom form in your internal wiki template where you collect the cross-references. On display, turn these queries into links or resolve them into content.

    In any event, why limit yourself to Wikipedia? Include cross-references to patent search engines and other domain-specific sources.

    A big word of caution, of course, is owed to the legal angle. Make sure you follow the law whenever reusing anyone else's content, even if it's just a link. Have your legal department sign off on your reuse policy. Don't distract them with technical aspects of what you want to do. They're lawyers; they only care about the law. Ask them a specific legal question, such as, "what is our legal exposure if we republish (links to or actual content from) Wikipedia on our internal wiki?".

  • You are trying to force a technical solution on a social problem. It's probably not going to work. Your best bet for success is to try and install a WYSIWYG editor for mediawiki. There are several out there. wiki, underneath, is just a programming language. It requires training people - no matter how much it is designed to be "easy." Make it easier.

    Consider Sharepoint. As much as /. is Anti-Microsoft, if your users are used to Exchange and Windows then Sharepoint is worth paying for.

    I've worked for Larry Sa

  • I wrote a very simple extension for my own mediawiki site that pulled in external pages as an iframe within a wiki page. I'd imagine you can do the same, Build your own wiki, with the wikipedia pages included below your own content.
  • The experimental Tearline Wiki [galois.com] system we've developed at Galois [galois.com] might suit your needs. Inside the firewall, you use MediaWiki with the Tearline system, and get a combined view of your internal wiki(s), possibly different wikis on different sub-nets, and you can integrate it with Wikipedia or other internet-based wikis to get the global context of the article.

    As others have said, integrating your content with other people's content can be a legal issue.

    Contact me if you want more information on Tearline :)

    pe

  • Use interwiki links. I use them to link our intranet, mediawiki, our external developer wiki, and our external support wiki.

    You will probably be unable to use them since using them requires the ability to get off your lazy ass and read the MediaWiki documentation or google for it, which results in plenty of information.

    Also the fact that you're going to have to be able to insert a row in a database is probably going to be over your head.

    READ THE DOCUMENTATION YOU LAZY FUCK.

Necessity is a mother.