Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Microsoft

Is the New Microsoft Office Really Open? 511

joesklein asks: "From CNET, there is an article about the new Microsoft Office 11. In summary 'Microsoft says it's opening its Office desktop software by adding support for XML--a move that should help companies free up access to shared information. But there's a catch: It has yet to disclose the underlying XML dialect.' Could this be grounds for another anti-trust suit against Microsoft?"
This discussion has been archived. No new comments can be posted.

Is the New Microsoft Office Really Open?

Comments Filter:
  • by Anonymous Coward on Thursday December 19, 2002 @05:07PM (#4925532)
    But there's a catch: It has yet to disclose the underlying XML dialect.' Could this be grounds for another anti-trust suit against Microsoft?"

    No.
  • InfoWorld articles (Score:5, Informative)

    by andynms ( 564072 ) on Thursday December 19, 2002 @05:12PM (#4925602)
    There are a couple of good articles on this at InfoWorld. Try here [infoworld.com] and here [infoworld.com].
    Good quote:
    THE GOOD NEWS is that Office 11 supports XML Schema. The bad news is that XML Schema has been described even by XML experts as "confusing," "impenetrable," "fuzzy," and "as user-friendly as a stick in the eye."
  • by timothy_m_smith ( 222047 ) on Thursday December 19, 2002 @05:14PM (#4925613)
    They haven't bastardized XML in the .NET Framework so I wouldn't expect Office to use bastardized XML.
  • Open? (Score:4, Informative)

    by Grip3n ( 470031 ) on Thursday December 19, 2002 @05:21PM (#4925696) Homepage
    I'd say the title of this article (Is the New Microsoft Office Really Open?) is extrmely misleading. Microsoft isn't even trying to be open, they're just adding support for another opensource language. A true open program would have its source code available. What this article is about has nothing to do with that. Microsoft Office is closed. Period.
  • by JebusIsLord ( 566856 ) on Thursday December 19, 2002 @05:23PM (#4925709)
    No because the dtd and/or namespace will have to be referenced in plain text in the xml document. so, even if they use absurdly complex element names, they have to use a valid dtd or namespace uri which can be easily referenced, or it just ain't xml at all. Also you aren't allowed to put binary data in an xml document, but even if they did reference their dtd by memory address for instance, its an easy task to just read that address. In conclusion they would have to break xml pretty hard-core in order to make their doc types proprietary. Besides, then what would be the point of going xml in the first place?
  • by watchful.babbler ( 621535 ) on Thursday December 19, 2002 @05:39PM (#4925835) Homepage Journal
    There was a fairly recent thread on this issue over at the XML-Dev list (see here [xml.org]). The upshot, according to W3C XMLWG member (and occasional Microsoft foe [textuality.com]) Tim Bray, is that Word is capable of saving documents in a WordML format that is parsable even without a DTD:
    I didn't see anything that I couldn't pick apart straightforwardly with Perl, and if someone asked me to write a script to pull all the paragraphs out of a Word doc that contain the word "foo" in bold, well you could do that. Which seems pretty important to me.
    So, from a technical perspective, there isn't much to worry about right now. From a legal perspective, no, there's no grounds for another antitrust suit, any more than there's grounds for suing Quark for not disclosing their file format.
  • by EnVisiCrypt ( 178985 ) <[groovetheorist] [at] [hotmail.com]> on Thursday December 19, 2002 @05:47PM (#4925906)
    The hell you can't put binary data in an XML document. As long as it's base64 encoded you can put anything in there.
  • by GOD_ALMIGHTY ( 17678 ) <curt DOT johnson AT gmail DOT com> on Thursday December 19, 2002 @05:50PM (#4925944) Homepage
    This is a monopoly. They have been found in violation of Anti-Trust laws and held up on appeal. The government has a legitimate reason to tell them how to conduct their business and every right to do so.

    Simply because the Anti-Trust trial focused on the OS rather than Office software, does not mean that the government has no reason to impose restrictions to keep MS from shifting their monopoly power. MS's monopoly has been under government scrutiny for almost 10 years, but we still get a bunch of posts on here about how the government shouldn't be able to tell 'a company' what to do. Either the trolls are really busy or you guys decided to skip Economics 101 for Libratarian Fanaticism 101.

    In order to maintain a capitalist system, we must have competition. Without healthy competition, we don't have capitalism. The government has an obligation to step into an otherwise free market to ensure that competition stays healthy. There is no magical 'Free Market Fairy' that is going to come along and restore health to the industry.

    So yes, depending on the result of the States' AG cases and the DOJ's settlement, MS could very much be liable for making their document formats some sort of completely bastardized XML. If you want to know the probability, then you should go read the settlements, and the grievences in the new filings against MS.
  • by burgburgburg ( 574866 ) <splisken06NO@SPAMemail.com> on Thursday December 19, 2002 @05:59PM (#4926017)
    Word HTML output was always atrocious. It failed everywhere from correct tag order (as is shown above), not properly quote parameters (sometimes it uses ", sometimes it uses ', sometimes nothing). Multiple tags, all with different styles one after another (actual example below)
    <b style='mso-bidi-font-weight:normal'><i style='mso-bidi-font-style:normal'><span
    style='f ont-size:12.0pt;mso-bidi-font-size:10.0pt;font-fam ily:Arial;mso-fareast-font-family:
    "Times New Roman";mso-bidi-font-family:"Times New Roman";color:black;
    mso-ansi-language:EN-US;mso-f areast-language:EN-US;mso-bidi-language:AR-SA'><br
    clear=all style='page-break-before:right;mso-break-type:sect ion-break'>
    </span></i></b>

    Even with grep replace tools, cleaning up this crap takes hours.

  • by Anonymous Coward on Thursday December 19, 2002 @06:08PM (#4926116)
    the XML specification talks about "well-formed XML" and "valid XML", where the former means valid in all the usual senses of the word, and the latter means "can be validated by a program".
  • by frisket ( 149522 ) <peter@sil[ ]il.ie ['mar' in gap]> on Thursday December 19, 2002 @06:12PM (#4926161) Homepage
    I was at the launch presentation of Office-11 [microsoft.com] by Jean Paoli at XML 2003 [idealliance.org] in Baltimore MD last week, and I'm also a late sign to MS's extended beta list for the product (now closed).

    To clear up some points people have commented on (based on a very preliminary inspection plus a lot of discussion at the conference):

    1. The default save format is still .doc (ie you have to go the extra click to save in XML format)
    2. If you pick to click it, the default XML format is MS's own office-document vocabulary, which retains all the formatting, held in attributes. Hairy but processable, and they will be shipping their schema for it so people can reprocess it externally. But this format will (of course) only represent the appearance, not any structure.
    3. It will also let you specify your own schema (or an industry standard one) and let you supply a binding of named styles to your element types, so you can edit using what look like styles but actually get represented in the saved file as XML markup. There is some debate as to whether this constitutes "being an XML editor" or just "being a wordprocessor that saves data in XML" (my money is on the latter).
    4. It will not support DTDs [www.ucc.ie], so you're stuck with W3C Schemas [www.ucc.ie] whether you like them or not* [slashdot.org]
    5. The discussion over a [more?] suitable schema/DTD for handling office documents (wordprocessing, spreadsheet, presentation) continues at the OASIS [oasis-open.org] TC on Open Office XML Formats [oasis-open.org] ** [slashdot.org]
    With Office-11, Microsoft has nearly caught up with Corel [corel.ca]'s WordPerfect [corel.com], (which has had a fully-fledged SGML and XML editor built-in for years) and XMetaL [corel.com] (which Corel took over from SoftQuad earlier this year). MS still has a long way to go to match industrial-strength applications like ArborText [arbortext.com]'s EPIC [arbortext.com] or even Emacs with psgml-mode et al [compuserve.com], but Office-11 will be a solution for the masses who believe the Word interface to be more desirable, or the Microsoft licensing régime to be more attractive, or the software to be more stable.

    * [Bias note] I think W3C schemas were a big mistake; provision for data content typing and validation, namespaces, and extended grouping could have been achieved by extending DTD syntax; and wimpy programmers who moan about having two syntaxes to handle should get a life - it's not a big deal, the code is free and has been in use for 15 years :-)

    ** Sun [sun.com] has donated the OpenOffice [openoffice.org] (aka StarOffice [sun.com]) XML file formats to the public domain. It's worth remembering that {Star|Open}Office has been saving in XML as its native format for some time now, and has a lot more experience at this than MS.

  • exactly (Score:3, Informative)

    by ink ( 4325 ) on Thursday December 19, 2002 @06:16PM (#4926183) Homepage
    I wish I had some mod points for you; that's exactly what Microsoft means when they say that their documents are saved using XML. They include Win32 class-ID objects all over the place.
  • by AnyoneEB ( 574727 ) on Thursday December 19, 2002 @06:51PM (#4926435) Homepage
    Someone will end up with a leaked alpha or beta copy of Office 11 and start working on the file format. If they will be able to figure it out fast enough is the question. It's possible, but if it's not done completely enough by Office 11's release what you describe will happen. Someone else said that Microsoft won't change .doc anymore partially because Google supports returning .doc's in search results... of course that just requires stripping all formating, which would probably be pretty easy.
  • by sgarrity ( 262297 ) on Thursday December 19, 2002 @06:55PM (#4926459) Homepage
    I use this Word HTML cleaner [textism.com] web service. Works well. Drop a penny in the paypal bucket if you like it.
  • Re:Defaults (Score:5, Informative)

    by dillon_rinker ( 17944 ) on Thursday December 19, 2002 @07:04PM (#4926522) Homepage
    Yup. Government standards are why you can buy screws and nuts from different manufacturers and have them work together. They are why you can buy "orange juice" at the grocery store and know that it's not "juice" wrung out of a pile of autumn leaves (hey, it's juice, it's orange, what more do you want?). Government standards are why you can fill fly in an airplane and know it won't crash.

    Sure, all these needs could be fulfilled by voluntary industry standards, if it weren't for those pesky human beings, fallible and greedy creatures that they are.
  • Re:LOL (Score:4, Informative)

    by loconet ( 415875 ) on Thursday December 19, 2002 @07:27PM (#4926708) Homepage
    I know exactly what you mean. Word spits out complete garbage when it converts .doc => .html . Microsoft attempted to address this issue by releasing an HTML filter plugin [microsoft.com] that you can install and cleans up the html word spits out. It does clean up the html but it's still kinda messy.
  • MIRROR: Original XML (Score:2, Informative)

    by gazbo ( 517111 ) on Thursday December 19, 2002 @07:31PM (#4926740)
    I've mirrored the actual xml file that has not been mutilated by slashcode policies.

    Look here [nildram.co.uk] using a browser that will display the raw xml nicely formatted - IE works fine, supposedly Mozilla does too but I can't seem to get it to work; it parses the file and just displays the text.

    Shame this is all so hidden away in the story.

  • by Ankh ( 19084 ) on Thursday December 19, 2002 @07:45PM (#4926837) Homepage
    Wow, what a lot of false information. Maybe this will help a little. Disclaimer: I am XML Activity Lead at W3C, so I have a bias.

    The new Visio is using SVG.

    The new Word lets you use any XML vocabulary you like. How obfuscated it is is *entirely* up to you.

    It's not using base64 to put binary propietary data into XML documents. It's using plain XML.

    It's well-formed, and Word appears not to make up thousands of elements. The person in charge of this project is actually clueful, and was in the W3C XML Working Group (1996-1998 by the way).

    The tools all use XSLT extensively.

    It wouldn't surprise me if you could get Word to read and write the OpenOffice format just fine. There's a restriction that you can't re-order content in Word right now, I think.

    People claiming to have "insider info" and then posting blatant falsehoosd, or claiming you can put binary data directly in XML, aren't helping here. Even if you get high from hating Microsoft, the open source community and Free software world need to understand that the goalposts have moved a little.

    The extent of corporate assets tied up in memos, reportsand other documents is very large, massively higher than the collective value of relational databases.

    Yes, it looks as if Microsoft has suddenly discovered XML just as they suddenly discovered the Web. In fact, they were involved heavily in XML from the start, were among the first to ship commercial support for XML, and have been working on XML in Office 11 for a long time.

    --
    Liam Quin
  • Re:LOL (Score:3, Informative)

    by Mike Schiraldi ( 18296 ) on Thursday December 19, 2002 @08:57PM (#4927116) Homepage Journal
    Dude: mmencode -u
  • by kazad ( 619012 ) on Friday December 20, 2002 @12:29AM (#4927822) Homepage
    Dreamweaver has a "clean up word html" option. But then again, another proprietary solution.
  • by Puu ( 596370 ) on Friday December 20, 2002 @06:20AM (#4928602)
    The testing is sickening. But it's us or them, really.

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...