Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
The Internet

On Creating Multilingual Web Sites? 189

Jens asks: "I am designing an Intranet web application that needs to come out in multiple languages. I am using PHP to include common elements in include files, which makes things a lot easier. I want to avoid making each change three times (I have someone doing the translations, however). The question is: How do I tackle the multiple languages? Do I separate design from content, or content from design? Do I write "<table><tr><td>$text[$lang]</td></tr></table>", keep the international text in include files, and then call the pages with appropriate parameters; or should I write "<?php nice_table("Dies ist der deutsche Text"); ?>" and keep three different files, but one include file with all the design elements? How do I handle buttons (i.e. graphics) with text on them?" (Read More)

Multi-language sites are tricky, but as long as there is some separation of page design and language elements, it shouldn't be too hard for the rest to fall into place. What determines whether you separate design-from-content or content-from-design depends on the plans for your implementation. What schemes work for those webmasters out there with already established multi-lingual sites?

This discussion has been archived. No new comments can be posted.

On Creating Multilingual Web Sites?

Comments Filter:
  • by Anonymous Coward
    I thought that the newer versions of apache could serve pages based on file extension. You could have a index.de, a index.en, a index.fi, and apache would do the right thing.
  • by Anonymous Coward
    It's possible to look at the HTTP headers, to determine the user's preferred language(s). If you are asking them directly, you can always store their preference in a cookie so you don't have to ask next time.


    Unless you have a particular reason not to use a database, use that to store your localised strings. Then, separate as much of the look-and-feel stuff as you can and put it into separate includes. This makes it much easier to maintain the site, because:

    • static strings can be edited using a straightforward web front end to the database. If you can't find something suitable, cook one up yourself. Then, if you need to add a language, simply add the strings to the database;
    • content can work on the same principle. If there's a delay between posting new content and the translation, you'll want to consider what your fallback position will be;
    • you can edit the look-and-feel templates using your favourite HTML editor; if you're working in a team, someone less technical will be able to work on them more easily.
  • by Anonymous Coward
    PHP is nice and dandy, but for somehow it is really hard to seperate content and layout, or content and language support/translations for that matter.

    Not any more so than any other scripting language...

    You have http://xml.apache.org/ but I didn't really checked that out as I didn't found any literature about the xml.apache and XML combination on the PHP scene.

    PHP has XML parsing support using the expat library that comes with apache. Docs and examples can be found at http://www.php.net/manual/ref.xml.php

  • by Anonymous Coward
    "Want to support those foreigners? Just put a link to http://www.learnenglishfast.com/ on your site. Let those foreign weirdos figure your site for themselves when they're not busy invading each other." To answer your comment above. There are MORE non English speakers in the world than English speakers.

    - blackbird.

  • by Anonymous Coward
    I worked for four years on Internet content for the Canadian government. Everything had to be bilingual -- on pain of death. It took a long time, but we eventually came up with an extremely simple and effective method of tagging languages. It involves splitting up <<different languages&&andere sprache>> using tags and parsing these files server-side.

    I created a web page describing the system, and some of the problems to be avoided: http://files.moo.ca/multilingual [files.moo.ca]

  • by Anonymous Coward
    You may want to check out Apache's MultiViews Option [apache.org]. It's a very handy, little used feature that lets your users chose their preferred language(s) once for all websites, right in their browser. If you enable this option, you have documents with the ISO language code appended to the filename, like index.html.en and index.html.it or banner.gif.en and banner.gif.it (for english and italian, respectively).
    The links in the web pages are then without the trailing ISO code:
    <IMG SRC=banner.gif>
    Very handy indeed. Makes it extra easy to add support for more languages, even after project launch.
  • by Anonymous Coward
    Not to nitpick, but why would you do it this way, when Apache (and HTTP) has a working, transparent, way to do what you're doing, without re-inventing the wheel...

    Instead of making your templates, you just do this:

    <A href=http://slashdot.org/><IMG src="http://images.slashdot.org/title.gif" width=275 height=72
    border=0 alt="##Language Title##"></A>

    (where ##Language Title## is set by the HTTP preferred-language header.. - this is the only part that Apache won't handle by itself..)

    You just created "title.gif.de", "title.gif.fr", etc, and let Apache handle the rest?

    You'd save yourself a whole lot of trouble setting up multiple DNS servers, and your templates would be a lot smaller..
  • by Anonymous Coward
    look at http://www.linux.be this site is multilanguage and multi layout. in PHP4.02b with Mysql. (click on star-trek and under sea in the right menu)
  • by Anonymous Coward
    Moderate this guy up ...

    I run a few bilingual sites and know that this stuff has to be built into your software. From my experience I suggest the following ...

    - If the page is static or rarely changes (like a TOS or privacy statement) use duplicate HTML pages. It's impossible to predict how these will change, and each will have to be retranslated if they do change.

    - If the page has to be updated on a regular basis, use your favorite scripting language to insert the language accordingly. This does not mean you'll need every page on your site to be .php or .asp or .shtml - just build the .php page and cache it to an .html page. You can build a function called "update" which update's all of your static pages when changes are made. (use lynx).

    - For dynamic pages / applications, it is best to create a string library / object which will store the language specific code. Then have your script check the language parameter and generate the page accordingly. The language parameter could be stored in the user database, but I would recommend just using a cookie. The oportunity to change languages is just displayed somewhere on the generated pages.

    - try to avoid gifs. Building them on the fly is a waste of the cpu, and maintaining a gif library is tiresome.

    - for sites that are strictly bilingual (i.e. english and on other language) it is preferable at times to include both languages on the same page. An example would be a registration form - you don't want the user to look for the language they need.

    /*
    regarding PHP vs. whatever - Take a close look at php. You'll see it's makes things _really_ convenient, and not only for new programmers. I know many perl diehards who like to do things the hard way, and the great availability of perl modules and scripts does give it an advantage, but php was built with web programming in mind, and it shows.
    */

  • by Anonymous Coward
    Many languages have richer grammar than English, and since the phrase/word is used as an index to the translation you often get into situations where it is impossible to do a correct translation... i.e. The word 'new' in English is used both in singular and plural, when gettext generates the .po file it will reference all places where the word 'new' is used and it doesn't allow you to split up the entry manually providing different translations for different contexts.

    X/Open catgets and the Java Locale system does not have this weakness, but requires more maintenance.
  • by mosch ( 204 )
    because perl gives you more than one way to do it. and they're both wrong. (hey, you flamed first)

    PHP isn't the root of all evil, and I've actually seen quite a bit of really elegantly written large php sites (500K+ codebases) that were maintainable and easily understandable.
    ----------------------------
  • At the top of the page call the secret function:

    init_i18n();

    and then use the super-secret _( macro. How it works is this. Lets say you have a function foo that's defined to be called foo (string baz, int bar) instead of calling foo('title', 3) you'd call foo(_('title'), 3). and echo "foo"; changes accordingly to echo _("foo");

    Now you just need to follow the instructions to setup your strings files, but you now know how to do the php specific gettext wrapping.

    You'll have to have the standard files containing language specific msgid->msgstr mappings, and it's helpful if you cook up a little script to grab every string and create a base msgid file if you're managing a large source tree. It's not the most intuitive or well documented procuedure unfortunately. At some point I'll have to write good documentation on how it all works, but hopefully this is at least somewhat useful.


    ----------------------------

  • I've been working on redoing my home page to use themes. Each section of text appears in a table, decorated with graphics to look like a window in a variety of OSes (Windows, Mac OS, Mac OS X, UNIX, and I'm working on more). On the right is a navigation bar, also designed to look like a window. The various vidgets (close boxes, minimize/maximize buttons, etc.) are completely non-functional.

    For any page on the site, you can click a link and change what theme you're viewing the page in. While my focus is on keeping the content the same and changing the surrounding aesthetics, while your focus is on keeping the surrounding aesthetics and changing the content, the same concepts still apply.

    My site is based on a CGI script written in perl called page.pl, which takes two arguments (passed in the QUERY_STRING): the name of the page to be viewed, and the theme to view it in. Eventually the theme selection will be moved into a cookie; you might similarly want to move language selection into a cookie.

    Unfortunately, my site isn't ready for prime time yet, and is currently hosted only on my 56k dialup connection (definitely unable to stand up to the Slashdot Effect). However, the previous version of my site works in a similar fashion, offering different layouts depending on what browser you're using. Much simpler, but it should give you some idea as to what I'm talking about. The main home page is done with shtml, rather than CGI, and lets you use a QUERY_STRING to override the automatic browser detection. Compare:

    http://www.inficad.com/~phroggy/?4
    http://www.inficad.com/~phroggy/?3
    http://www.inficad.com/~phroggy/?2

    The three versions are designed for Netscape 4.x, Netscape 3.x and all other browsers, respectively (MSIE pretends to be Netscape). The difference between versions 3 and 4 is subtle; look for the drop-shadow - Netscape 3 doesn't support background table graphics, so I designed version 3 not to use them.

    If you're seriously interested in seeing my new site with the themes feel free to e-mail me according to the instructions in my sig, and I'll give you the URL.
  • Why not just write in English, and have a CGI that uses Babelfish to translate when a user logs in indicating some other language? :-)
  • by Matts ( 1628 ) on Monday April 17, 2000 @10:46PM (#1127125) Homepage
    Don't forget: XML was designed from the ground up for multilingual support, via the xml:lang attribute. Use that, not some invented tag!!!
  • Actually, you can cut out steps 2 and 3 in that. The browser passes the language setting to the server, which the server can handle using multiviews. That way, you can get the browser and server to take that part of the workload off your back.
  • Actually, I'd remove -BOTH- the form AND the content from the page. Fetch the outer form from one database (that saves you having multiple copies of templates, for each page), have content-specific HTML in a second database and fetch the content from a third.

    My thoughts would be to have anything not explicitly related to the actual content stored seperately from content-arranging tags (such as tables, paragraphs, etc.). This lets you maximise reusability, and minimise effort.

    ie:

    Database 1 -> Outer Shell Template
    Database 2 -> Content-Specific Formatting Template
    Database 3 -> Actual Text, in 1+ records. This should contain NO tags, whatsoever. That's all done in #2.

  • Two of the simplest but most widespread problems with viewing foreign language web sites, particularly in non-Roman character sets, are when the language is not specified in a metatag or the font face *is* specified in HTML.

    The first problem has been described elsewhere on this page, but deserves reiteration: specifying the encoding language in HTML can allow the correct font, language script and character set to be used automatically by the browser without scripting on the server end. On my Mac, for example, this would allow the Haaretz newspaper site to come up automatically in Hebrew (having chosen the correct language script and font for me). I would not have to manually choose the settings. Good examples of where this is not done but should be are the Yahoo! Asian sites. The only way my browser knows I'm looking at Korean text on Yahoo! Korea is because I choose it; the HTML pages are not encoded to tell my browser this.

    Scripting to determine what preferred language is chosen in the browser is, in my opinion, the hallmark of a great multi-language site. The second hallmark is a link on every page to switch to another language at will.

    A similar item happens in bad email programs: they do not specify the character encoding in the header. One of the nicest things about a great email program is that when I get, say, a Japanese-encoded email, even if the words themselves are English (by using the Roman characters built into the Japanese encoding group), it kicks in the Japanese character entry and editing system automatically. It recognizes it, as it should.

    Intentionally *not* specifying font face is equally important. If I want to view web site in Hebrew or Arabic, a ridiculous number of the sites require that I download the particular font they have specified in their HTML. This is preposterous. Language encoding, used properly, might mean never having to download another font again.

    As for graphics in lieu of foreign fonts: avoid it any way you can. It makes copying and pasting near-impossible. (I once had to do it this way: I saved the GIF image to my hard drive, converted it to TIFF and ran it through and optical character recognition program, then cleaned it up manually. A huge waste of my time).
  • Having worked with multi-lingual Web sites, I must precise a few points.

    First you need to remember that a Web site must be designed to work easilly with existing tools you use to develop/maintain Web site. Altough Javascript seem nice, the maintenance seem to me like a nightmare.

    Having one directory per language for static pages seem so much easier to maintain for content change. For design change it's more of a pain.

    Ask yourself, does your organisation is more likely to change contents or design? Generally I would think in more organisations that content is more likely to change than design. Do your planning accordingly.

    PS: For Canadian websites, the Canadian federal governement as a directive that a website MUST be bilingual (English/French) or not to exist at all.
  • Please ...

    PHP isn't limited to embedding in HTML. Sure, it works great WITH html, but isn't limited to it. I've written whole applications that deal with nothing but text and interface with a template system (also PHP) to insert the content into a template.

    One extremely nice feature of PHP that a lot of Perl users overlook is mod_php. As an Apache module, PHP scripts have a smaller memory footprint, a faster startup and parsing time and lower server load than Perl. One could use mod_perl, but the number of servers offering mod_php far outnumbers those offering mod_perl, due to various issues with unscoped variables.

    OTOH, Perl's text capabilities are legendary.

    Also, someone noted that you had to put PHP in every page in the site to accomplish what this guy wants. Not true. I have coded sites with only one real page and many 'virtual' pages. I use a combination of ForceType and globbing PATH_INFO to make it look as if you're navigating a bunch of different real pages, but you're not.

    The PHP does all the work of talking to the db and seeing what pages should show up where and getting the correct content.

    Now, no more FUD! :)
  • Correct. See this comment [slashdot.org] for more information.

  • by jeffg ( 2966 ) on Monday April 17, 2000 @09:07AM (#1127132)

    Those looking for multilingual solutions for sites might want to look into making some use of Apache's content negotiation. See http://www.apache.org/docs/cont ent-negotiation.html [apache.org] for more information.


  • The approach we are using is to separate both language and style from the content by putting all the text into database and building object oriented set of libraries which automatically select ui-style and language using sessions.

    example:
    $page = new Page() # New page object is initialized; user, language and style are identified
    $page->title('7652'); # Title phrase number 7652 is printed on users language and style to page object
    $page->paragraph('7898'); # Chunk of text is printed
    $page->showpage(); # Page is shown to user in selected style

    The phrase-table is then indexed by phrase-numbers and languages and modification dates are kept up-to-date for all the phrases. All the content on the pages can be modified on the fly by using online-editors, which gives translators and content creators access to phrase-table. Editors also make the phrase numbering totally transparent. The phrase table is distributed to multiple development machines and synchronized over CVS using special synchronization tools. The approach is very fast on Apache+DB2+Linux+embPerl platform, but caching into static perl-hashes in Apache-registry is also possible to make it even faster.

    You can go to our site (Europes first and only virtual hospital providing healthcare on the net) Atuline.COM [atuline.com] to try it out. Go to demo and inside the virtual hospital try to change language and user interface look'n'feel from settings/interface. Currently only 5 languages / 3 looks are supported but there is more to come..

    -- Joonas

  • I read the documents on the squid homepage and it says that it caches the objects as they pass through it, which speeds up static content by offloading the web server. However, squid cannot cache dynamic content (obviously -- that's why it's called *dynamic*). When a request to a cgi script is made, squid passes it on to the web server. So, I don't see how it can possibly speed up the dynamic content generation.

    Also, why would you want to run squid at all, even for static content? Would it not make more sence to just use a bigger box for Apache? Or run 2 Apache boxes clustered instead of Apache + squid?

    ___
  • Microsoft kind of had a monopoly as of late WRT performance-wise USABLE XSLT parsers. Now this is about to change with Apache's Xalan-C. IT should be quite fast. See Apache-XML's site. [apache.org]

  • People are the problem, and that's precisely my point. I don't want to trust any dumb luser around. I'm just not suicidal. The same way that my servers hopefully don't go down as often as stupid lusers remove important files "by mistake". Thank you for your attention.
  • Our site defaults to English as the encoding, but we give users the option of selecting a local language version. The i18n support is pretty basic, being mostly limited to navigation and a few relatively static pages.

    I basically set a cookie that contains a variable indicating the desired language (default is EN if no cookie). My PHP3 scripts look for files similar to the following index_$lang.php3 where $lang is a two-digit language indicator. If the deisred language page is found, it is displayed. If not, the english version is shown.

    For menus, I abstracted the localized portion and substituted variables for all of the strings. The PHP scripts then look for a menu_$lang file to include.

    Also, I use the ISO entity descriptions for all of the accented characters so I don't have to worry (as much) about fonts on the client.



  • To do translations from English to German or to italian or French is not a trivial task, but they are so much more easier than automate and put up websites that have to carry both English (or other romanized) language, PLUS the double-byte encoded languages such as Arabic, Chinese, Korean (using Unicode or other encoding methods)...

    So... if anyone have done the above that I've mentioned, I'd appreciate if someone can share a clue or two.

    Thanks in advance.

  • I have recently run across something similar. I want to convert a a telnet-based bbs (called M-Net [arbornet.org]) to a multilingual site. The problem is there we are limited to merely what VT100 will let us output. In the end, we decided to translate the documents and use basic locale and i18n features already in the OS to help. It is not quite the same but I hope it helps.
  • No Multilingual Site Is Complete Without...

    The Irish-language curse engine (An tInneal Mallachtaí)!

    Jefferson City, Missouri's Lincoln University offers this amusing little interactive time sponge here [lincolnu.edu] and The Register explains it here [theregister.co.uk].

  • Due to non-disclosure and pretty things like that, I have to be vague, but my company runs a web site with multiple language support. My php's a bit rusty but here's a basic php translation of our logic...

    <?php
    if ($language == "spanish") {
    include spanish_text.php3;
    }
    elsif ($language == "german") {
    include german_text.php3;
    }
    else {
    include english_text.php3;
    }
    ?>

    (...HTML setting up your page and whatnot...)
    <TITLE><?php echo $page_title ?></TABLE>
    <?php echo $welcome_text; ?>

    The file english_text.php3 would contain stuff like:

    <?php
    $page_title = "Superduper Website!";
    $welcome_text = "Welcome to Superduper Website";
    ...etc...
    ?>

    german_text.php3 would contain the exact same variable declarations only the values would be in german. This technique is messy but flexible. YOU could also apply this to images (i.e. <IMG SRC="<?php echo $submit_button_image ?>">).
    • _____

    • ToiletDuk (58% Slashdot Pure)
  • Supposedly NS6 has "built-in" translations. If a page is in german and you are set for english, netscape will run the webpage through a transulator automagically and then spit back to you in your native tounge. At least that's what I heard.
  • Europeans get moderation points too. I usually spend them during Swedish daytime (yes, I should be working...).

  • Something like this, I guess:

    * Keep a series of design templates. In it, have tokens for things (ie, $title, $header, $copyright, etc) that are pulled from a DB. (Or include files, or whatever.)
    * Pass a language variable (lang=fr, lang=en-us, etc.) in the URL.
    * Use this language URL when parsing the $title and $header tokens in your design templates.
    * The content, of course, is stored in language-specific files or DB entries. You could use the W3C language standards (Apache respects these) and store 'em in files, or whatever.

    Poof, it's all assembled. That should do it.

    -Waldo
  • Languages like perl have enough possibilities to compile the graphics from within the language. PHP can do that also (but why go dynamic and tease your little server?), and with very little difficulty.

    Actually, programming your graphics gives you great flexibility, methink.

    Good luck,

    Jeroen

    Post Posting: Has any european poster any chance to get a rating higher than 1 (unless posting in the middle of the night?). Probably, this posting proves no again.
  • Well, that is nice to know.

    Some time ago I posted an elaborate piece about
    a neuroscience related topic, AFAIR the bandwith of a neuron. I got one enthousiastic reaction in my mailbox, but no moderation at all. Makes one wonder if there is use in posting at all.

    However, European stuff is read also, apperantly.

    Thanx,

    Jeroen
  • (If anyone is interested, we're thinking of open-sourcing the code for our site, which would make this OOD template system available for all db-backed sites. Let me know if this is something that there is an actual need for in the PHP world.)

    You might want to check out an effort to create a template engine for PHP written as PHP extension. See http://va.php.net/~andrei [php.net].

  • I'm working on a multi-lingual-capable system these days; it puts together static text from various templates (in various files) with script-generated sections, and has meta-commands for "this bit in french" "this bit in english" for cases where you want to put everything into a single file (otehrwise you can just have one file for each langauge version). the driving ideas are:
    1. total separation of content and code (to the extent that the content files don't even have conditionals, all they have is a way to give names to different bits and specify templating relationships)
    2. dynamic "compilation" of pages into data structures, with cacheing of this intermediate form, and
    3. the fundamental unit is a content page that calls code, not a code page that outputs content.
    the whole thing is perl-based, on top of mod_perl and speaking directly to apache's perl API (i.e not using CGI.pm or Apache::Registry).

    send me an e-mail if you're interested; it's in a very raw format now and not near release, but i'll probably be able to release it as open source. if not, well, at least we can talk about it :)

  • Actually, there is a much easier solution.

    PHP supports the GD library, which can happily superimpose text onto a PNG (or GIF depending on the version of libgd PHP is linked with) graphic. So you can just make a few button images (selected, deselected, active) and use PHP+GD to drop the correct text for whatever language you are using on top.

    See this link [php.net] for a good example to build on.

    --

  • Separate layout from the content and layout your pages using fixed symbolic references.

    Internationalization then becomes a two phase process. You can have your visitors select from a drop-down list of languages (that you're supporting) as to what language they want to see.

    Then use JSP, ASP (boo-hiss) or ModPerl to fetch from a dictionary containing all of the symbolically described tags you have, a dictionary containing the language specific content.

    The script can shove out the page containing the final page content. (You'll find StyleSheets to be extremely useful for controling positioning and typefaces.)

    Charles-A.
  • That document refers to a couple of RFCs:

    • RFC 2295: Transparent Content Negotiation in HTTP
    • RFC 2296: HTTP Remote Variant Selection Algorithm -- RVSA/1.0

    Uhm...moderators, hello?

    That post, was in fact, informative. But wasn't its parent even more so? Afterall, as opposed to pointing out the RFC, it pointed to practical information on an implementation of that RFC, in this case, Apache, undoubtably the Web server that Jens is using.

    The RFC is for people implementing a Web server. Apache's content negotiation [apache.org] is for people providing resources in multiple languages (among other things). Which of those do you think Jens is? Yes, that's right, the latter. Making the RFCs slightly off-topic.

  • PHP4 gettext is the way to go. IMP [horde.org] (a GPL'ed web-based mail reader) is fully internationalized and this is the direction they're going.

    The PHP4 function _(x) is a synonym for gettext(x), so the code ends up being very readable for the maintainers: _('Permission denied.')

    --

  • I has a similar problem recently, in designing a press releases site (http://press.ducati.com) in italian and english.
    What I did was use templates, and store the language preference in a session variable (I have only registered users, so the lang pref is set according to the user preferences, and can be changed with a link on each page, overriding the session variable).
    Once you do that, you just have to check the variable when declaring templates. Using the template class from phplib (http://phplib.netuse.de) you would do:
    $tpl->set_file(array(
    "mypage" => $lang . "pagetemplate.html",
    "myblock" => $lang . "pagetemplate.html"
    ));
    That's it! Easy to do, easy to maintain.
    If you're using php, a great template class is the one above, waiting for the release of the php templating module, which is in the works.
    If you're using (ehm) asp, you canuse my jscript port of the fasttemplate class (http://www.sumatrasolutions.com/asptemplate)
  • I am currently responsible for globalizing a web project at my company. This is the first time I have had to deal with this sort of thing, and I have learned much. Here are some tips:

    1) Whether you decide to dynamically fetch strings when the page is processed, or have multiple versions of the HTML isn't as important as deciding your strategy in the first place. Make the decision and stick with it long before you start work on the actual project. Having to implement a globalization strategy for a site that has already been programmed can be difficult if not impossible. Heed the warning about separating content from code, but be sure you know how that is going to play into your strategy.

    2) Language is only part of the problem. You need to consider sort order, for example, if you are presenting sorted lists. You need to consider date and time formats and also number formats. Some countries swap the comma and the decimal point, for example. If you are planning on selling something, then multiple currency support would be useful.

    3) You need to support multiple code pages. It would be neat if you could just use UTF-8, except there is no widely available Unicode font that contains all the glyphs needed for some languages. It is poor globalization design to only support the latin codepage assuming that you'll never need Korean, for example.

    4) Make sure you avoid colloquialisms and other culture-centric ideas on your page. Keep it simple and as icon-free as possible. Where you have icons that contain text, keep a copy with the text layer separate from any background elements. Gimp has some features that help when localizing these bitmaps. But it's best to just avoid them.

    The project I am heading up contains several hundred .asp files. Rather than translating every one of these files into who knows how many languages, we are creating a string resource that can be queried by a server object. Someone recommended that you look at the GNU gettext, which I second. If you can find standards that already exist, I recommend you use them.

    Someone else recommended an XML approach. Again, this is a good idea to consider.

    Don't try and re-engineer some existing code to make it global. I can't emphasize that enough. Start global from the ground up. Try to find the most intuitive means of doing so.
  • I realize that this is not completely on-topic, but here goes. What would be the reality of writing a browser add-in, or in the case of an OSS browser like mozilla, a piece of the browser, that can translate the content on the current page?

    That way it would take the work of translating and put it on the client end. The software could try to deduce the language from the URL and for images it could use the ALT tag to come up with a translation for the image or it might be possible to use a combination image processing/ocr algorithm to try to distill the text from the graphic on the button.

    It would be great if the software could, like i said, guess on what the most likely language was and similarly allow the user to choose to turn the translation on or off and to what language.

    I mean, the technology is there, as evidenced by babelfish, what would it take to turn it into a client side program to transparently translate webpages into the viewer's native tongue?

    This way, not only would pages that have to budget to translate be available to everyone, but every page would be available to everyone, without the added step of going to babelfish (i realize that its not too much extra work to do that, but hey, people are lazy...).

  • (IANA PHP Guru, but here goes anyway). Let me start by saying that IMO there's no easy answers for this because translating between different languages is inherently difficult. [I read, speak, and can write in a couple different non-English languages (Japanese and Spanish if you're curious), and because of those languages can read a bit of other related languages]. That said, the portal site I am working on is meant to be multi-lingual later on, so maybe some of my thoughts can help answer the "ask /." question:
    1. Start with the current /. codebase as a learning resource. I recommend this even though the Slash code is in Perl and somewhat complex because of how extensively /. relies on the MySQL database and user preferences in the generation of the page. Once you understand the concepts, you can implement your own user preferences related to language.
    2. Use templates stored in the database to build the page from a common code engine. For example, if my preference is "language=English" and another user's preference is "language=Espanol", then I can build a MySQL select statement that looks something like this:
      "Select page_template from pages where language = {user_preference} and page_id = {the page which the user is asking for}
      where the items in brackets {} are replaced by the web server code.
    3. Design templates to be multi-lingual. For example, the logo for /. is a graphic containing the English words "Slashdot News for Nerds. Stuff that Matters", and the image is part of a hypertext link. So the HTML for the image is like this:
      <A href=http://slashdot.org/><IMG

      src="http://images.slashdot.org/title.gif" width=275 height=72 border=0
      alt="Welcome to Slashdot"></A>
      So what would a multilingual version look like? Well, for example, assume that the Spanish (Espanol) version should go to "http://www.Spanish.Slashdot.Org". Here's the same HTML except using token strings for the language specific items:
      <A href="http://www.{##language##.slashdot.org/"> <IMG src="http://images.slashdot.org/##language##titlei mg.gif" width=275 height=72 border=0

      alt="##Language Title##"></A>
      Then you write the PHP code to replace the tokens (the part between the number sign pairs (##_____##) with the user's preferences (which was probably stored in a client side cookie) [Note: this example implies that you have images for the different languages]
    If you understand these techniques, you are well on your way to creating a multi-lingual site in my book.
  • The main reason to use templates is to enable a consistent look and feel across a high number of pages by changing just one HTML template.

    As far as your code example, I might be missing something (A technique I don't know?) in what Apache would do (where you mentioned "title.gif.de", etc.)

    How would the browser know that the file was a GIF, etc. with a .de ending, etc.

  • They're all gonna learn English. Oh yes, learn English they must. Cause nearly all existing code and software is in...(drumroll)...English! And there's nothing the computer industry/world likes better than backwards compatibility and tradition.

    So......

    I believe the link was http://www.learnenglishfast.com/

  • Part of the answer depends on what you want the site to look like for the user. Most Canadian web sites start off with a splash/welcome page where the user chooses English or French (e.g. Canada Post [canadapost.ca]). Personally, I would want a bit more functionality on the first page.

    I think that, in any case, you need to make sure that any repeated design element only occurs once so you never need to make the same change in multiple places to redign the site. THis seems sort of obvious, but it's hard to be more specific without more details about your project. I have always tended (when using SSI) to have the "actual" html file contain the content and call the design up from a single template -- and have been very happy with this approach. You need to decide whether you are adding another decision layer to this (e.g. from content+design to content+design+ language -- or just stay with content+design and make seperate pages for seperate languages)

    ======
    Webmasters: get a Free Palm Pilot [jackpot.com] for referring 25 signups (Web-based games).

    ========

  • by turg ( 19864 )
    For the question about buttons, you life would probably be easier if you used wordless icons accompanied by text -- I personally don't think it makes much sense to use GIF/JPEG/PNG to represent text in any case. But if you really need to use graphics to represent text, make it part of the variable subsitution -- img src="$language$button_name.gif"

    ========
  • For a site where accessibility is a prime concern (a site on blindness for example)

    Accesibility should be a prime concern for every site. What on earth makes you think blind people have only a very limited range of interests? Do you think it's fine web sites use plugins that are only available for Windows users, and only a site like Slashdot should concern itself with plugins for Linux? Or would you agree people using Linux have more interests than "news for nerds"?

    -- Abigail

  • by Abigail-II ( 20195 ) on Monday April 17, 2000 @11:20AM (#1127162) Homepage
    How do I handle buttons (i.e. graphics) with text on them?

    You don't. Try to imagine you are blind and need a speech interface, or that you have bad eye sight and need 48pt fonts to read something, and then be faced with a site that uses needless graphics for navigation, when written words would have done as well, if not better.

    -- Abigail

  • Does PHP allow you to create objects like Java does with JavaBeans?

    Yes, php has a basic concept of objects, not incredibly powerful, but good for grouping together data and operations. Better yet, php can actually create and work with Java objects. Or better still, php can be embedded in a java servlet engine so that you can use php as a replacement for JSP in your servlets :)


  • Keep in mind, there are a lot of reasons to use one thing or another. If people don't want to use ASP because it's an MS product, who cares? The most vocal group are always the idiots. While it's nice to do this so easily, you need to look at other things here. For example, what would it cost for this person to switch to an NT/ASP solution? Consider a hardware upgrade, software licensing, and most imporantly, the time required to learn a new platform *and* a new language, this is not an economical choice and may not be worth it to be able to solve the problem in two lines of code rather than 50.

    Also note that with php, you can do this with equally few lines of code at all. Using gettext support, you simply create your translations and store them elsewhere, then by prefixing your output with a _ it will be translated.

    There are always multiple ways to solve problems, and each one has advantages and disadvantages. Don't think that people avoid IIS/ASP just because they hate MS. Those who do don't really matter in the grand scheme of things.
  • by rm -rf /etc/* ( 20237 ) on Monday April 17, 2000 @10:39AM (#1127165) Homepage

    The thing to remember here is that PHP vs Zope is not a decision you will ever have to make. Zope is not a language, it's an application server. Python vs PHP is a comparison. If you want to talk zope, you have to look at the php equivalent, midgard (http://www.midgard-project.org/). Not that I have anything against zope or python, they are great tools, I just think for the task here php/midgard are much better suited. Part of this is because I think PHP has a quicker learning curve and let's face it, there's no sense in mastering a language just to create a multilingual site... Second, Midgard is much more suited to this type of thing. I suggest Zope to people who want to program a website, and Midgard to people who want to manage website content. Midgard is much more focused on content and using inherited styles and layouts, plus giving multi user web based access to manage content and layout. For something like this where you have the same layout and style just with different content, I think Midgard will really do this with less hassle and effort.
  • by rm -rf /etc/* ( 20237 ) on Monday April 17, 2000 @09:10AM (#1127166) Homepage

    PHP4 can be built with gettext support. gettext is a GNU library for internationalizing programs. PHP's support is undocumented currently, so you'd have to check out the code to see what it does (in ext/gettext), but it might be worth looking into.

    Gettext info and manuals can be found at http://www.gnu.org/software/gettext/
  • I have common files for page headers and footers, and seperate language files for the body. The body files contain html for language-specific graphics (that contain text).

    I then use a perl script to combine the header + body + footer together.

    This is rather painful tho, as there is a lot of redundancy, and page changes are quite painful.
  • by jilles ( 20976 ) on Monday April 17, 2000 @10:44AM (#1127168) Homepage
    "First off you've got to determine what language the end user wants to view the site with. This can be done multiple ways: client hostname, browser version (language), server hostname (ie. japan.bigcorp.com), or by hitting a button or link on the page."

    client/server hostname is a bad idea. I'm dutch but I live in sweden and when I visit www.lycos.com, I'm presented with the swedish version of this site. My browser settings are ignored (favors dutch above english and english above swedish). As a consequence I don't use lycos anymore. There is of course the alternative link which I can click to get the dutch or english version but that's too much trouble for me.

    The best way I think is to just ask the user what language he/she prefers. Even if english is not your native language, it sometimes is the best option since the english version of a site is updated more often.
  • Nav bars are fine in any language if they are horizontal. You have 10 seconds to keep someone rapt (some would argue less) and extra bars and rollovers really tick some people off. Also, if you internationalise your data, you will be better off not creating graphical buttons. What would you rather have, a site that is accessable and clean in every language, or one that is good looking? remember, you can't search/replace text in a graphic if you make a translation mistake!

  • Perhaps the icons are logical, but when given mystery meat navigation [webpagesthatsuck.com], she is most likely confused. That's the thing with icons, they have to be targeted, and they have to mean the same thing to all people. This fellow is talking about an international effort, and therefore must not only translate, but localise his content.

    Localisation involves matching culture to the point that an add doesn't give the wrong impression [snopes.com].

  • by angelo ( 21182 )

    It is unfortunate that they should pick this name. The name is currently used by the WAP Forum [wapforum.com] in their Wireless markup language [wapforum.org].

    I figure they are aware of this.

  • It would make a lighter light mode... I think I'll get the slash code and make up some patches. (dusts off perl books) Or maybe I'll try out the code to some of the php-based systems. I'd actually like to see this all run from XML+XSLT on the backend to generate a better experience based on client. This would rock.
  • That's css2, and it isn't really necessary. I rarely find a use for tables except for placing a table of data on a page. For layouts, I set borders in css, and keep it simple. You could, however, use a span or division to achieve a modest sidebar. The tr-td combinations in slashdot's layout are superflous if you use a nested div tag. You define two css classes:

    div.top { margin-left: .5em;} #this is the top layer of ea thread
    div.sub { margin-left: 1em;} #for every nested thread

    The result is a set of nested articles, much like the slash code.

    <div class="top">
    top thread text
    <div class="sub">
    second thread text
    </div>
    <div class="sub">
    third, parallel thread text
    </div>
    </div>

    no table/tr/td combinations, just divisions. the slasdot front page would be defined as 4 divs: A top for the ads and logo, one for the left column, one for stories, and one for the right. Without css, everything would fall straight down the page, and as such, be available in next generation WAP devices.

  • by angelo ( 21182 ) on Monday April 17, 2000 @01:42PM (#1127174) Homepage

    Damn straight! that may very well be out of my personal design theory [lm.com]! Tables are logical elements for organising data, not a tool to lay out your webpage! I couldn't put this better myself! Maybe if people paid attention to content, the internet would be a far better place. I suppose I'll go back to dreaming.

    PS: I tried to come up with an alternate layout for /., and made it in css. not only did it look sweet, but it was changable from a stylesheet! whoot!

  • Just being picky: I would recommend image file names as button_name-language.gif. This would put NextPage-Eng.gif and NextPage-Ger.gif next to each other alphabetically.

    -B
  • Wo xue ci zhong wen duo tien. wo zhen xi huan. Ni bu xi huan zong wen? ching si, geng men!!
  • a lot of the problems mentioned are easley solved with the Roxen WebServer [roxen.com].
    Roxen has an extensive SSI language (RXML) that allows you to build powerfull sites.
    eg. buttons can be created by the server, with the text you specify (like <gtext>click here</gtext>), you can also use a backgroundimage and thus putting any kind of text on top of any image.

    of course seperation of content and design is key.
    in order to achive 100% seperation i wrote an xml template [iaeste.or.at] module for roxen, that allows me to specify the content in XML, and then apply a template to show it.
    roxen also makes it easy to check for the clients default language, so you could select the language based on that easely.
    the most important thing is, it all looks like html to the webauthor. there is no programming involved.

    sure, php is nice, but if you don't know about programming languages, then it will look very confusing, also it's very easy to screw up, because the non-programmer doesn't see that there is a ; or , missing, that creates a syntax-error in the php code.

    greetings, eMBee.
    --

  • Not only when it comes to translations, but context-layout in general. PHP is nice and dandy, but for somehow it is really hard to seperate content and layout, or content and language support/translations for that matter.

    You might want to check out Midgard [midgard-project.org] then. It is a Web application server that uses PHP as its scripting language. While it isn't that much better in internationalization, it at least has good support for separating layout, structure and content into different components.

    /Bergie

    --

  • I suggest Zope to people who want to program a website, and Midgard to people who want to manage website content.

    It sounds like you think Midgard is better ;-)

    Well, the whole point of all these systems is to extend HTML with some kind of dynamic behavior; if you define this as "programming", then they all require some kind of programming. The basic scripting language of Zope is DHTML; it works like practically every other system like this -- you write HTML and decorate it with script directives -- just like PHP. If anything, DHTML is simpler than PHP, albeit less powerful. You do have the power of Python behind the system if you need it, but many if not most people will use Zope's capabilities blissfully unaware that Python even exists.

    This midgard project looks interesting, but the screenshots were not very informative (except to say the developers perhaps have a little more aesthetic sense than most). I'd be interested in your unvarnished opinions on midgard, and I'll give you my own unvarnished opinion of Zope.

    First the good stuff.

    Zope's first great strength is that it is an object publishing system. It allows you to reuse, not only display templates, but database queries, and useful bits of logic (e.g. converting lists of URLs into horizontal or vertical link menus). These things have clear usefulness for database oriented applications, but they also have a great deal of usefulness for things which are normally done through static inclusion. For example, I can define a table of links, and then define logic to transform that table into horizontal or vertical menus. Of course this is easy to do in PHP, but the neat thing about Zope is that it is simple and natural for a document to inherit the list of links from a higher folder; to override the list of links but use them in the same way as the document template; to reuse the same transformation logic for different purpose. Of course, you can do all this stuff using low level scripting systems, but its a lot of work to make it happen, whereas in Zope it is simple, natural and automatic.

    For database work, it manage persistent connections very nicely, and by providing objects for database connections, SQL statements, and transformation into various presentations, it allows you to plug all these things together tinker-toy fashion.

    Because of this reusability, there's a lot of terrific stuff that's already been done than you can simply grab and plug into your website, such as a slashdot like discussion forum (minus moderation, alas).

    Now the bad stuff.

    The documentation is pathetic. It is obviously written by people who know the internals of the system and have since the beginnning of time -- they don't really remember what it is like not to know. In fact, the user guides sometimes make use of undocumented internals. Of course you have the source code, but that's really a last resort.

    I often say there are two kinds of people who read documentation. One are hands on people who like to see examples and generalize from them, and others are abstract folks who like to see principles and specialize from them. The documentation doesn't really serve either. The step by step examples are somewhat obsolete and if you follow them they don't always work with the latest stuff. Sometimes because the method documented has been somewhat superceded by newer methods which are explained in vague, handwaving manner which will infuriate abstract thinkers and do nothing for hands on people.

    This truly awful documentation will be a show-stopper for many people. YOu have to beat your head against the system for a while

    Speed is not exactly stellar. Python has time and time again shown its ability to handle incredibly complex logic; but it is SLOOOW. Zope is surprisingly fast considering that it is written in Python and practically every page is parsed with some dynamic behavior. There are some moderate volume techie sites like Bruce Peren's technocrat site running Zope, but you aren't going to scale to Slashdot or Amazon's scale.

    That said, I chose Zope because it works very well, once you learn your way around it. I don't know half of it, and there's no way of even learning half of what it can do without going to the source code, unfortunately, but what is documented is great.
  • Apache can be used in reverse proxy mode with Zope, too!

    Any pointers to docs on this?
  • Who ever moderated this up should know that zope is a python application and php3 is a language just like python. There are similar applications to zope that use php3. But compairing an application to a language is just not done.

    Well, OK, what if I were to compare Zope Document Template Markup Language (DTML) to PHP3; would that take the knot out of your knickers?

    Seriously, if you'd like to share your knowledge of PHP based application servers, I'm all ears. There's something new every day, I'm just sharing what I learned last year when a lot of alternatives weren't there yet.

  • by hey! ( 33014 ) on Monday April 17, 2000 @09:22AM (#1127188) Homepage Journal
    Don't get me wrong, I like PHP a lot, but it's greatest strength is for people who like to work in HTML and but make it smart. You can separate content and presentation in PHP, because it is a reasonably powerful language, but doing throughout your site means turning every page in your site into a program to emit the content in the desired language. Once you go there, you may as well consider other options for transforming content.

    There are lots of ways people have come up with for doing this kind of thing. You could do each translation as an XML document and use XSLT to convert into to a fully decorated HTML document. You could use java servlet changes to take a simple HTML translation and to add the usual banners, links and formatting. This would be nicer than transparently dynamic content because it could be cached by the user and downstream proxies.

    My current favorite method is Zope. In Zope, nested folders inherit all the characteristics of their parent folder, but allow you to override them. I use this to enforce stylistic uniformity between people maintaining content on my site, which is just a generalization of your problem.

    With zope, you could do your original site in German, and put it in a folder called "/foo"; then create an empty folder called "/foo/EN" which by "acquisition" starts by looking identical to "/foo" even though its blank. Then you start overriding the various text bits into English, and gradually, an English translation emerges. You could even write a little method to iterate over a bunch of links and append "/EN" to them.

    The main issue with zope is scalability, since everything is dynamically generated. I use squid as a reverse proxy to cut down on dynamic page generation overhead. This also turns out to be an easy method for multihoming zope, which is a requirement that I have.

  • you probably want to read this article on the real complexity involved in this sort of localization, and the pitfalls inherent in the sort of substitution you suggest.

    suffice it to say, it's a nightmare for anything beyond nice and friendly iso8859-1. the author uses an amusing anecdote about a simple localization of a simple feedback message into Chinese, Arabic, Russian and Italian...

    - mark
  • How you do it is going to depend a great deal on how you get your content.

    For example, is your content mainly user-contributed, or does is come from professional writers? Where the stuff comes from is going to make a big difference as to how it gets translated. User submitted stuff could probably be run through the babelfish [altavista.com] in pseudo-real time (a daemon which queues requests, uses wget or lwp-request to have them translated, and then sticks them into a database). On the other hand, professionally submitted content should probably be treated a little better than simply babelfish; hire some native speakers as translators.

    The way I would do it is something like this (keep in mind that I am a Perl/C programmer specializing in Apache/mod_perl; some of the things I mention are inaccessible to PHP, like translation handlers):

    • Back end daemon that takes queued requests and passes them to babelfish to be translated, then they get stuck into the database (spearate thread, running as a cron or daemon). This daemon could be fed through a web-based form, for example, for a site with user-contributed content, and the data would be put into a staging area, where the daemon would read it, translate it, and then enter it into the live database.
    • Web pages would specify an initial two character language code (e.g., /en/foo/bar.html). When a page gets requested, a custom URI translation handler would strip out the initial two character name and keep that around (via r->notes) for future use. This could be done via a Perl module as a PerlTransHandler or a custom C translation handler, or, if you are/can not use perl or C, you could use mod_rewrite to splice off the initial 3 characters, strip off the leading '/', and stick the last 2 into the environment as, e.g., LANGUAGE (fetch them like you would and environment variable).
    • Alternatively, if you are using user authentication, you could fetch the preferred langauge from the users profile after the Authentication stage.
    • When the time comes to actually produce content, I would retrieve the langauge code from the notes table/environment and use it as part of a custom SQL statement. The database would be set up in such a way that content is broken up into as many different tables as possible, so that as much common (i.e., non langauge-specific) content could be used as possible. Using some sort of a multi-table join, I would bring the content all together before the templates are actually filled in, so that when the time comes to fill them in, you don't need to switch on the language; it's all already taken care of by the database.

    Retrieving the data and putting it onto the page would be the easy part... once you have the information stored in your database in the appropriate languages, that is. Designing the database so that you have as little unnecessary redundancy as you can while still ensuring that all of your content is available in all the required languages will definitely be a challenge, but it's an architecture problem, not a programming problem.

    Good luck. I, for one, would be interested in hearing how you make out and what track you decide to take.

    darren


    Cthulhu for President! [cthulhu.org]
  • by thogard ( 43403 ) on Monday April 17, 2000 @01:18PM (#1127193) Homepage
    I run one site that is accessed from all over the world and I can tell you that 90% aren't using a 4.0 broswer (its only about 1/3 and only thouse from well off countries) and looking at how long it takes to transfer the pages the connection between here and there is often way slower than 33k.
  • If each page is a collection of bits and tags from three separate databases youd have to write tons of fancy administration code, and it sort of limits the ability to use other web technologies in tandem with your site, as well has have anyone who isnt an experienced programmer modify the content layout, since everything has to fit into a very restrictive and complicated architecture.
  • by zzzeek ( 43830 ) on Monday April 17, 2000 @09:12AM (#1127195)

    I havent used PHP before, but heres the basic idea not specific to any programming language..

    build your site in English (or your default language), assuming we are talking about regular static HTML files or some kind of .shtml perhaps. Write a filter for your webserver (I use java servlets mapped to *.html myself) that parses the path info of the request for something like "/german/foo/bar.html", i.e. the language identified in the beginning of the URI. Then within your HTML files, anywhere you want translated text, do it like this:

    <translation default="english" modulename="/foo/bar/mytext.html">

    This is my english text.
    </translation>

    Then within your filter servlet or apachemod or whatever, parse the file for these tags (caching schemes can be utilized for speed), and then based on the language encoded in the URL, dynamically replace the body text (if the URI-specified language is not the default) with the contents of a file <languagebase>/foo/bar/mytext.html. So you could have somehting like /web/translations/german/foo/bar/mytext.html, /web/translations/spanish/foo/bar/mytext.html, etc. If a translation file is not present then you just use the default text already in the document, so you can still launch new HTML pages even if a translation is not available yet.

    If you want to raise the bar of speed, dont use a dynamic filter, just write a perl script to regenerate an entire static site underneath "/german" "/spanish", etc. using the same scheme. Or you could even mix up the two approaches.

    If your site is not static HTML but some kind of database driven thing, you can still use a similar approach, it just means the filtering program has to be molded to fit your content-delivery environment.

  • Thanks to built in support in NT for unicode, and heavy support in IIS/ASP for multi-language applications, it's cake.

    Basically, you set a "region code" at the begining of each page in ASP, and then simply supply your multi language content from a seperate location. Each page's content is determined by this region code.

    Wow... that takes about 2 lines of code, and works around the world perfectly. I bet I will get nothing but flame for this post because it uses an MS technology. Funny... I remember the days where people used things not because of who made it, but because of how well it did it's job.

    Memories...
  • I've developed software that is to be used with multiple languages.

    You can see the software running at http://beta.infopop.net [infopop.net].

    It uses XSL and XML to render each page. Users are able to upload their own XSL with different languages if they wish. This is the 'template' approach mentioned by many other posters. Contrary to what has been said about XSL performance however, I've found it works well.

    Also, we use a database to store keyed messages in different languages. Each message is requested by key and looked up by the language of the site its being used on.

    The only problem? Getting Oracle to swallow UTF-8 characters. We're having a daemon of a time. If anyone has worked with Oracle, Java and Unicode I'd love to hear from you! peter @ infopop . com
  • Since I'm in my local universities Comp Sci Databases class, and just having tackled this problem in the work place, I can tell you the most
    poetic way is to use a database.
    You will need 4 simple tables:
    Lanuages: A one-field table of lanuages "English", "Deutch", etc...

    Strings: IdNum (as text, like a #define constant), and a field for your
    native language i.e. "Click Here"

    Finally a 3rd table, translations, containing the IdNum references IdNum.Strings, the
    language (references language.Languages) and finally the translated
    text.

    Of course, you mau want to use integers instead of strings but I like them this way. They are easier to read, etc. Your image problem is the same as the last table, but you store blobs instead of strings.

    And one function translate($id, $language)
    where a typical call would look like:
    translate ("Click Here", "English")
    would execute:
    select phrase from translated where lanugage='$language' and
    indum='$id';
    and return that.

    then you can make cool online forms for the translator to use =) Please let me know if you use this. (Just curious)(Comments also welcome)

    There are reasons for 3 tables, but I will not go into my exact mental model or DB thoery here.
  • For my site, we use content "templates" in PHP. Ours is a bit more complicated because we are database-backed, but the concept is the same. I just teach our designers to use a syntax like this:

    <? $this->show( "foo" ) ?>

    which gets include()ed in the context of an object that holds data associated with "foo."

    For your application, if you don't want to go with an OO design (which you should, IMHO), you could just do this. Define a variable like $lang which can be one of "en", "de", "fr" and then every time you have language-dependant content, just do:

    <? include( "foo-$lang.ihtml" ) ?>

    Just make sure every PHP file shares a common header that sets $lang appropriately. To do this the even easier way, just make that part of your auto-prepend setting in php3.ini.

    (If anyone is interested, we're thinking of open-sourcing the code for our site, which would make this OOD template system available for all db-backed sites. Let me know if this is something that there is an actual need for in the PHP world.)
  • Better yet, have it be something like:

    <? // do nothing ?>
    welcome to my site
    <? // still do nothing ?>

    since PHP does not require the <? tags, and include()ed files get printed out by default (one of PHP's strongest features, imho)

  • You're starting to get into web app server areas. Vignette and zope support htings of this nature. (Although zope has a context sensitive subclassing feature i like alot (vignette doesn't))

    Zope [zope.org] Vignette [vignette.com]

    There are others(possibly asp, coldfusion, jsp, etc...), but these are the ones I know a bit about.

  • Lets go under a few assumptions:
    1. you don't want to skimp on the site design/graphic look
    2. 90% of the web is browsing on a 4.0 browser that supports CSS
    3. 90% is using a minimum of a 33.6 dialup.

    under these assumptions, I would like to say that you wouldn't have to skimp on anything to create a site that caters to a variety of languages with only 1 version of the code.

    CSS tied into something like PHP would be your answer plain and simple.

    Text could be dynamically swapped in depending on the selected language, keep your image buttons plain and layer the text above the image, etc and you're all good.

    Make a site in which you could easily substitute different "palettes" into the design... not designed differently, just coded differently. It's not as difficult as many people make it out to be. CSS is a much more powerful tool in web design than many people give it credit for (most use it just for text decoration). I suggest picking up a book on it.
  • Post Posting: Has any european poster any chance to get a rating higher than 1

    Yes. Repeatedly

  • I have experience with two different approaches for creating and maintaining multilingual web sites:

    • All-in-one:

      We use this approach for the SSLUG web site, and except for the lack of time, I see it as a success.

      The language choice is based on (in order of priority): Direct choice, Accept-Languages, and client domain.

      The actual implementation is done with SSI and plenty of conditional blocks. This makes the raw files look rather messy, but it is definitely beneficial to have the different language versions just next to each other.

    • One file per language:

      I have used this approach for my own web site, but I wouldn't call it a great success. It is difficult to keep track of translations when they are stored in different files and the result will typically be pages that are severely out of sync. The main benefit of using one file per language is that you can leave the language choice to the Content Negotiation/MultiViews feature in Apache.

    My advice for people starting on a multilingual web site is:

    • Edit the pages as multilingual files.
    • Publish the pages with one language per file.

    The multilingual files should be as close to plain HTML as possible, so something like

    ...
    <p>
    <en>Hello world
    <fr>Salut le monde
    <da>Hej verden
    </p>

    where you simply allow a sequence of language coded tags followed by a text in that language would probably be a good solution.

    /Jacob (who is going to change his own site soon)

  • When I read the story, I saw the "$LANG" tag as a "SLANG" tag.... which actually might not be altogether useless in a multi-lingual site, if you use a Babel-Fish-esque algorhythm to translate parts of it.

  • I've got a bilingual site, 100% PHP; admittedly, what I'm going to say applies only to languages using the latin alphabet, but the simple fact of having one directory per language makes maintaining the site in both languages as easy as falling off a log, and nothing is duplicated. And I didn't even consider whether this was design or content. Buttons and everything are completely bilingual except where I chose deliberately not to make them so. The site's called www.mrquiz.org. If you take a look and you're still interested, I'll gladly send you the source. My mail address is on the site.
  • Basically what you're asking for is templates. PHP is designed to be an HTML-embedded language, but for people who want to separate the PHP from the HTML and text, you can use templates. Once you're using templates, it's easy to create scripts that choose a template for whichever language is currently selected.

    There are two routes you can go for using templates with PHP, FastTemplates [thewebmasters.net] and the PHP Base Library [netuse.de]'s ("PHPLIB") Template.

    So how are they different? FastTemplates was originally a Perl library that was ported to PHP. FastTemplates works well for Perl programs, but it's not ideal for PHP. Kristian Koehntopp wrote PHPLIB Template from the ground up as a pure PHP library to better take advantage of the capabilities of PHP. One advantage to Kristian's design is that it parses templates with preg_replace(), which is said to be faster than FastTemplate's reliance on ereg_replace(). Another advantage of PHPLIB Template is it allows dynamic blocks to be nested, unlike FastTemplates.

    For those reasons I prefer to use PHPLIB Template, but you do have a choice of the two libraries.

    It may be worth also mentioning the XML approach. XLT is an XML based format for templates, so you might want to look into that. PHP4 can parse XML, but there isn't code to specifically parse XLT as far as I know. XML or XLT are options if you need them, but they're probably more involved then you would need for most PHP projects that really just need templates.

    And for a nice tutorial on PHPLIB Template, look for my article on phpbuilder.net [phpbuilder.net] sometime soon (assuming the editor over there decides he wants to publish it). But even if my article doesn't get put online there, it is a very nice site for PHP info.

  • Everyone is looking at this as some kind of database or server side scripting problem, which is IMHO overdoing it a bit, and missing the point

    Firstly, every webserver and browser worth using already handles default language handling. It's an integral part of HTTP. A french user can be given the french version of the page transparently by the server (as their browser already knows what languages they want). This part of the specifications is there for a reason, use it! People don't want cookies, logins and unnecesary choices.

    If you use apache (the only server I have more than half a days experience with) search for content negotiation in the docs. It's actually set up by default (download the tar ball and look at the "It worked" page to see what I mean). In a nutshell, instead of a file foo.html, you have several files for each language, foo.html.en, foo.html.fr etc, and the server works out which one the user wants.

    From what I understand, IIS also handles this kind of thing well, if you want to use it.

    Getting the webserver to do this for you will almost certainly be quicker than anything you can write, as it is better integrated into the server. By serving static pages things get even quicker still. If you need Dynamic content, PHP, Perl, ASP or CGI scripts can all be programmed to use the default-language headers. If you want to generate dynamic content, choos ethe language this way rather than trying to create your own system (unless you already have a login system ala slash, in which case it wouldn't hurt to add it as an option).

    None of this information is hard to find. In fact, it's pretty hard to avoid it when looking at the Apache config files, so I don't know why everyone else has missed it.

    hth

  • I've evaluated methods for implementing multi-lingual sites.

    First off you've got to determine what language the end user wants to view the site with. This can be done multiple ways: client hostname, browser version (language), server hostname (ie. japan.bigcorp.com), or by hitting a button or link on the page. Typically, I lean towards using the hostname (if ($HTTP_HOST) == "deutsch.bigcorp.com") { $lang = "de"; }) in combination with link set a cookie for the language choice. The 'guessing' of the desired language based on the browser or the client hostname doesn't work all that well because there are a lot of foreign nationals in the states that may want to view in their native language, and vice versa for overseas.

    Once we've determined the language choice, typically I have multiple tables in the database for the language options. I.e. headlines.en or headlines.de and you just append the language choice:
    <?
    if (!isset($lang)) {
    $lang="en";
    }
    $query = "SELECT headline, link FROM headlines.$lang";
    ?>

    So for pulling from a db that's pretty easy. When We want to present seperate pages or page layouts for the different languages (i.e. localized data, or product offerings, etc.), its not too hard to do either. You can do it with http header redirects:
    <?
    if ($lang == "de") {
    $location = "page.de.php3";
    } else {
    $location = "page.en.php3";
    }
    header("Location: $location");

    Or (this is what we usually do), when building a frameset point to the proper language page:redirects:

    <?
    if ($lang == "de") {
    $location = "page.de.php3";
    } else {
    $location = "page.en.php3";
    }?>
    <frame src="<? echo $location;?>">

    Images are the same sort of thing. We just would append the language code to the image (or easier, is to set the $image_path to like "/images.de", etc.) Then you just pull from the selected images path.

    If you are building images dynamically, and they have text embedded into them, you're going to have to hax0r it to output the language of your choice. However, I think that many of the Internet users from abroad understand that its really a English/American dominated network, and if everything isn't offered in their native language, they aren't going to get super pissed off. If you make an effort to get the key content in as many languages as you can, you'll be in good (better) shape.

    What becomes tricky (and what I don't have much experience with) is the non-roman based languages (ie. asian languages). We typically have to outsource this work to a translation company, and they tend to provide us with rasterized and vector-based files that we can then embed into our site. If you find a good translation company, they should have experience doing this sort of thing and probably can help you figure out the best methods to employ. The do this stuff for a living, and many of them are top notch.

    -k
  • It has structure in place for producing web documents in several languages. It's slow, but if you don't have much dynamic content, or you just want to know how it's implimented and maybe adopt for your PHP needs...

    http://www.engelschall.com/sw/wml/ [engelschall.com]

  • How do I handle buttons (i.e. graphics) with text on them?

    You can use the background= feature of cells to place a background 'button' graphic in each table cell. As long as the cell is bigger than the graphic, the 'button' will float to the top left corner. Then use align (likely top and left) to get the text in the right spot. Be sure and use contrasting colors properly. Make the text part of an link, and away you go.

    Fit and finish -- use CSS to suppress the decorations on the text, since it already looks like a button. Don't forget the ALT tags! Conveniently, you will also be text-enabling your pages.

    All this applies whether you use a EN#FR#DE style inline in the page, or an exernal approach. The external approach works better if the text is unlikely to change often - things like a database access site, for example.

  • by elegant7x ( 142766 ) on Monday April 17, 2000 @09:23AM (#1127270)
    Don't use tables at all, use CSS layout, you'll find it makes it a lot easier to seperate content, people with 4.0+ browsers will see the cool stuff, and people with older browsers will see some old-school, highly readable HTML. Table layout is dead, long live CSS...

    Most HTML+CSS pages are readable right from the source, and would be easy to translate the file in whole.

    If I were in your place, I'd put every paragraph in a database table, with each row having the text in each language. That way, the translators could work on one paragraph at a time, and it would still be easy to update.

    Amber Yuan 2k A.D
  • I've done this on a recent project by storing all text into a MySQL database and writing a simple perl script to merge the text from the DB into the HTML files.

    It goes like this:

    1. Make a language resource table. Call it "RESOURCES". The columns are TEXT_ID, LANGUAGE, and TEXT_DATA.

    2. Make an html template directory. You will store all "raw" html files here. Beneath this directory, make subdirectories for each language (eng, frc, jpn, etc.)

    3. In the HTML, make references to the database values by some easily identifiable token string, and wrap this token string around the TEXT_ID value from the database for this text resource. If you want, you can put the english equivalent inside the token string, so you can read the templates in their raw form. E.g.:

      <p>##38471::Welcome to my multilingual website!##</p>
    4. Write a script (I chose perl) to:

      - Read the templates

      - For each language to translate: - Look for the existence of the token string (## in this case)
      - Take the resource ID, do a database lookup based on the language
      - Substitute the resource text for ##ID::string##
      - Save the modified html to the language subdirectory for this language.
      - End

    That should be it. Now, when an english-speaking person comes to your site (you'll have to ask them somehow of course), you can just redirect them to /path/templates/eng/file.html, and everything will work.

    This doesn't address the images, however. If you're using languages that use the western european character set (french, spanish, english, portuguese, german, italian, etc.), it will be easy. You'll be able to type your text directly into photoshop or the gimp or whatever and make your graphics. The next thing is to put a language token in these HTML templates that you've made for all images. Something like this:

    <img src="/images/##LANGUAGE##/button.gif">

    And in your language parser, write a one-line substitution that will substitute all instances of ##LANGUAGE## to the current language you're iterating through.

    If you're translating to languages with different character sets (double-byte languages such as chinese, korean, etc.), you'll need to create your graphics differently, but once their created, the storage of them is the same. One way to create them is to write a cgi that will run through the DB and print, in HTML all the text resources of a given language. If you have your browser set to the correct character set, you will see the foreign language characters correctly. You can then do a screenshot, and paste the screenshot into your graphic editor to make buttons or whatever out of.

    This approach has worked really well for us on two projects so far, and looking to be more projects soon. The advantage of making these HTML templates is that it greatly reduces the load and time it would take to build the pages if they were dynamically created from database lookups upon request. You just run the template generating script every time a change is made to the template, and voila.

  • I am told that Oracle's WebDB [oracle.com] translates on the fly to something like 11 laguages.

    _________________________________________

  • What the hell do you expect?
    Do I separate design from content, or content from design?

    YOU ARE USING PHP!!!

    The reason that many of us would never use a
    language such as PHP is that PHP is based on
    being embeded in a .HTML file. ASP is the
    same design, same problem.

    If you are one dude, maintaining a few pages
    of one sight, and you don't have any knowledge
    of any other tools, then PHP might be the only
    tool you use for any job. If, on the other
    hand, you are maintaining many pages, or many
    sites, or have knowledge of other tools, this
    is a good time to start using those other
    tools.
    Perl with XML::XSL would be a good tool for a
    job like this, but don't expect to be
    seperating content from design in a language
    designed to be embeded into the content!

    <CHANT>
    The right tool for the Right Job
    The right tool for the Right Job
    The right tool for the Right Job
    The right tool for the Right Job
    </CHANT>

    When in doubt, use perl 8-}


    ---
    This is your life, good to the last drop.
    Doesn't get any better than this.
  • If you are using mod_perl, jou might realize that the problem was already adressed and solved (well partly) by Apache::Language avaliable on a CPAN mirror near you. It is HTTP/1.1 compliant, works around buggy browsers and tries very hard to do all that transparently. Language storage can be in flat files, SQL databases and any other storage method you can access thru perl. Just my 0.02$ advice
  • I see a lot of people go off into the deep end with all kinds of complicated databases and transformation tools. Maybe this works for very large projects with lots op people working on them.

    My experience is:

    If you want to be able to translate texts in a reasonably efficient manner, you should keep small texts for multiple languages together and separate them for large texts. For instance, I use a lot of scripts that generate forms. So I start every page with an array that contains words and phrases:

    if ($lang == "nl")
    $texts = array("name"=>"Naam", "age"=>"Leeftijd");
    else
    $texts = array("name"=>"Name", "age"=>"Age");

    (I include a header that figures out the user's language by the http accept language, user and site domains (none of which are foolproof) or authentication/cookie data for registered users.)

    Translating this is very simple: copy the array definition and change the phrases. You don't want to use a database for this, because you need to be able to look at the from and to languages at the same time. For large texts I include html files. Translating them isn't much of a problem, keeping several versions up to date is harder.

    Don't forget that many users speak more than one language. For instance, many users I talk to in Dutch on my site want to see links to content in both Dutch and English, so when they sign up they can choose between Dutch, English, Dutch + English and English + Dutch.

It is easier to write an incorrect program than understand a correct one.

Working...