Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Data Storage Software

Changing Your Filesystem's Locale? 15

dybdahl asks: "Now that Red Hat has changed the default character set to be UTF-8, none of the existing filenames that included local characters like æ, ø, å, (Denmark) are handled correctly by Konqueror or can be seen correctly with "ls" in a shell. Is there a tool out there that can convert an ISO8859-1 ext3 filesystem to UTF-8?"
This discussion has been archived. No new comments can be posted.

Changing Your Filesystem's Locale?

Comments Filter:
  • Convert what? (Score:3, Informative)

    by Sam Lowry ( 254040 ) on Wednesday May 14, 2003 @05:24AM (#5952903)

    The filesystem has been stocking the filenames in utf-8 for ages. What you have to do is to make sure there is iocharset=utf-8 in the options of mount in the file /etc/fstab.

    In general, man mount helps a lot.

    • Re:Convert what? (Score:3, Informative)

      According to "man mount", "iocharset" is an option available to filesystems (v)fat, iso9660, and ntfs only. It's also available for smbfs.
    • Re:Convert what? (Score:5, Informative)

      by amorsen ( 7485 ) <benny+slashdot@amorsen.dk> on Wednesday May 14, 2003 @05:44AM (#5952965)
      The filesystem thought it was using UTF-8 filenames. That is what the specification says it should use. However the unfortunate poster has used ISO-8859-1 (or -15) file names. Therefore he now has a file system that does not conform to the standard, and of course he wants to do something about it.
      • when moving files from a filesystem on the ISO-8859-15 charset to one on the UTF-8 charset - say vfat to ext3.

        I know.

        Luckily there were only about 12 files (courtesy of a recent trip to Sweden) and mv-ing them wasn't too tricky.

        Any more and I would have got seriously frustrated, and probably ended up writing convmv myself.
  • by cyberkreiger ( 463962 ) on Wednesday May 14, 2003 @05:35AM (#5952937) Homepage
    I think convmv [freshmeat.net] may be what you're looking for.
  • during install (Score:3, Interesting)

    by Apreche ( 239272 ) on Wednesday May 14, 2003 @08:28AM (#5953537) Homepage Journal
    I've been trying shitloads of distros lately (journal has more info). And despite other problems all of them have asked me what my locale is, what character sets I want to support, and all that kind of stuff. I must say if there is one thing that is more trouble in windows than in *nix it's internationalization. As with everything though, there is a config file somewhere and a package to install.
  • by SpaFF ( 18764 ) on Wednesday May 14, 2003 @02:23PM (#5956654) Homepage
    Ok, so RedHat makes the default charset UTF-8. Just change the default to ISO8859-1. Its like a 2 line change in /etc/sysconfig/i18n. I had to do a similar change when we switched our mailserver to RH8 because early versions of spamassassin (more specifically perl though I think) didn't like playing with UTF-8.

    -Lee
  • by spitzak ( 4019 ) on Wednesday May 14, 2003 @05:28PM (#5958500) Homepage
    Forget all this nonsense about "locales". It is obvious there are exactly 2 "locales" of interest, UTF-8 and ISO-8859-1. Now suprisingly enough these can co-exist almost perfectly, so there can be *one* "locale" and we can be rid of all this brain-dead attempts at i18n.

    What systems should do is treat all streams of bytes as UTF-8, with the additional rule that all sequences of bytes that are not legal UTF-8 (including a unicode value encoded with more bytes than necessary) should be treated as individual bytes in ISO-8859-1. It turns out that you need three accented characters in a row, or a capitalized accent character followed by a foreign punctuation mark, for an ISO-8859-1 to be confused with UTF-8.

    I very much believe this works, although I think a search should be done through lots of ISO-8859-1 text to find out if there are any common sequences that are confused with UTF-8.

    Even if this is not a perfect solution, it certainly is better than the current scheme. Most filenames will be readable. More importantly it gets rid of the idea of an "error" in a character string, significantly simplifying the interfaces.

"Protozoa are small, and bacteria are small, but viruses are smaller than the both put together."

Working...