Sanely Moving from Word to the Web? 547
FooAtWFU asks: "I have a job for a web site (no link for you, Slashdot hordes!). A lot of it is systems administration and development, but I have to routinely post content which comes from a myriad of other sources. Usually they are from academic users, come in Word format, and ultimately need to be posted in HTML. The problem is that Word has all sorts of tricks up its sleeve to throw off the font, layout, size, and so forth. To achieve any sort of visual consistency on the site these various formatting tags all need to be scrubbed, but even using other office suites with better HTML export (OpenOffice.Org) to do the dirty work, it's often easier to recreate the formatting by hand from a plain-text version than it is to clean up a sea of messy tags. Does anyone have any advice (or magical tools) to help me deal with this sort of tedious cleanup?"
PDF? (Score:2, Insightful)
Just a thought
PDF? (Score:3, Insightful)
Re:Scrapping (Score:3, Insightful)
It's really better to use things that other people have made for parsing html. For example, if you use perl (and you should -- it's the ideal tool for this), HTML::Parser works pretty well, though there's a signifigant learning curve in using it.
Re:Resign from your executive position (Score:5, Insightful)
I use mutt and fetchmail in a company of Exchange users. Almost every email I get at work now, from everybody, is in html. (Unless I sent it to myself.) I don't like it, but I deal with it. It's certainly easier to deal with it than to try and change everybody else.
I could change jobs, but over something as trivial as html emails? No. I like my job, I like the people I work with, so I just bend like the reed in the wind ...
Still, the executives are certainly worse about email ettiquette than most, and it's not just in this company -- everywhere I've worked I've found this to be the case. They don't include Subjects at all, or include useless ones like `message'. Some will type up a memo and send it as a .pdf file attachment, or worse as a .bmp file. They rarely trim anything when responding to a post -- they just top post away. (But many people do that ...)
Recreating formatting? (Score:2, Insightful)
The problem with conversion of documents to HTML in general is the expectation that the formatting needs to be preserved. There have been times where I needed to "post" a document to a web site, and I always try to get the author(s) to not worry about formatting. Formatted documents are pure evil simply because 9 times out of 10 it does not affect the relevant information that you are trying to convey to your audience. Sometimes, the authors give me grief about it, but I simply show them the possibilities of separating the content and presentation during the translation. I convert their documents to generic HTML (with whatever tools are available) and use CSS to apply relevant formatting for the type of document (a report, article, thesis, or whatever). No funky font tags, or weird tables. Just the let the HTML flows as it's meant to be.
Re:Sounds like you should release on sourceforge (Score:3, Insightful)
Plus the templates are probably in-house templates and thus would be useless outside of the company.
Yes! (Score:3, Insightful)
For documents that are going to be viewed online, it's infinitely preferable to use a free-form format like HTML (was designed to be) that can adjust to varying monitor and window sizes.
Re:Dreamweaver (Score:3, Insightful)
Indeed we'd love to move to advanced CSS for page formatting but that's a big step right now - there are no professional WYSIWYG editors that have the sheer range and quality of features we need - Page templates, ability for clients to update the site later in a very convenient WYSIWYG interface, high compatability with ultra-common web media such as Flash, etc. etc...
Trust me we're keeping our eyes peeled for a better solution but right now Dreamweaver is the best available. Sometimes simply sticking to "standards" isn't neccesarily the best idea. In fact, sticking to proper standards creates sites that differ in appearance from browser to browser. Dreamweaver has very impressive awareness of inconsistencies and standards-deviation in many browsers.
Re:PDF? (Score:3, Insightful)
Re:Resign from your executive position (Score:3, Insightful)
and as another poster has suggested, perhaps it's the quality department to blame as a memo or whatever, isn't a real memo or whatever unless it has been created with the official approved template...
Re:PDF? (Score:2, Insightful)
Do you really think the extra bandwidth costs would be more than the cost of his salary while coming up with a better solution?
Re:Resign from your executive position (Score:3, Insightful)
Why do you use mutt and fetchmail? Why? Why? Why? Just about everywhere I have worked it has been easier (and often there is no choice) to just use what they use rather than trying to be clever or different. It is good to gain wide experience and it is good to have the flexibility to use the tools at hand.
Re:Actually, an NDA probably doesn't matter. (Score:3, Insightful)
Did he say the website was public?