Forgot your password?
typodupeerror
Java Programming

Getting Unicode Character Codes in JavaScript? 26

Posted by Cliff
from the tricky-problems dept.
jargonCCNA asks: "I've searched high and low across the web, but I can't seem to be able to find any code snippets or even anything that'll help me out here. I'm trying to get a Unicode character code from a data stream in JavaScript and there doesn't seem to be anything out there to help me; JavaScript itself only has onboard support for ISO-Latin_1, or something. I tried hacking my own converter code, but it's rife with errors. Anybody know of some code that I can include in a GPL project?"

"Here's the buggy code, if you're interested:

function unicode2hex( unicode )

{
var hexString = "";

for( var i = 0x0000; i <= 0xFFFF; i++ )
{
test = eval( "\\u" + i );

if ( unicode == test )
{
hexString += i / 4096;

hexString += i / 256;
hexString += i / 16;
hexString += i % 16;
hexString += "";

return hexString;
}
}

return false;
}
"Mozilla's JavaScript console lets me know that '\u0' is an illegal character. I think this would work if I could make it use the string "0000" instead of the number 0 for i.

Just for reference -- I've seen a lot of people get nailed on Ask /. because they didn't do the proper research before asking their question. Google has failed me; I've been trying to figure this out on my own for about a month. I hope someone can shed some light on my situation."
This discussion has been archived. No new comments can be posted.

Getting Unicode Character Codes in JavaScript?

Comments Filter:
  • by Henry V .009 (518000) on Thursday August 01, 2002 @06:27PM (#3995289) Journal
    How did this story get past the lameness filter?
    • Desperation for Quality content.
    • How did this story get past the lameness filter?

      Stories are probably not subject to the lameness filter (or at least they have looser filters) because an editor must approve each story by hand.

      That said, I have a possible (untested) solution: Try changing each += in the inner loop to a +=""+ to force the strings to be concatenated rather than treated as numbers.

  • Ask the Experts at http://selfforum.teamone.de [teamone.de]. It's a german forum, but most people there can read and write english as well. The SelfForum is related to the famous SelfHTML (at least here in Germany, it is famous). Just copy and paste your question there.
  • What's the deal? Cliff must have hit the "Accept" instead of the "Reject" button by accident.

    Try asking your question in IRC before hitting up "Ask Slashdot."

    A search on google for unicode and javascript brings back a lot of positive looking results without actually delving into them. It seems like JS1.5 has support for this (from the Google summaries).
    • A search on google for unicode and javascript brings back a lot positive looking results without actually delving into them.

      Yeah, positive looking. That's the thing. Looks are exceedingly deceiving on a search engine. Try actually delving in; I can almost guarantee that it won't convert Unicode characters to their character codes.
  • Ok, I got my "Second Post" in.. Now here's the good answer.

    document.write("\u00A9 Netscape Communications" );

    I just did that in Galeon and it works fine...

    See - http://developer.netscape.com/docs/manuals/js/core /jsguide15/ident.html#1009690
    • That's great, except that it does the opposite of what he wants. He seems to want a function that'll turn the copyright sign to "00A9".
      • Ahhh.. You're right, I'm wrong... But I'll repeat the truly correct answer as I have already lured someone down the wrong path:

        document.write("\u00A9".charCodeAt(0));

        That provides the decimal, then you just have to convert to hex.

        function Dec2Hex (Dec) { var a=Dec % 16; var b=(Dec - a)/16; hex="" + hexChars.charAt(b) + hexChars.charAt(a); return hex; }

        Blatently ripped off from here [internet.com]

  • Why don't you ask the Mozilla developers that are working on JavaScript 2.0?
  • by Lazarus Short (248042) on Thursday August 01, 2002 @07:31PM (#3995620) Homepage
    No offense, but I haven't used JS in years, and I found this in a matter of minutes.

    document.write("\u00A9 is ");
    document.write("\u00A9".charCodeAt(0));

    That will give you the answer in decimal. I trust you can convert to hex yourself.

    (Note: Requires Javascript 1.3; previous versions used ISO-Latin-1 rather than unicode, and I don't know what they'd do with a character higher than 255.)
    • All right, you're officially The Most Helpful Person On Slashdot now.

      I looked through all the documentation I could find; the only thing I found about charCodeAt() was that it use ISO-Latin.. But I think they also said they were JavaScript 1.2-specific.

      Any idea what version of JavaScript IE6 emulates, and Mozilla actually uses?
      • Well, the example I used works as expected in IE 5.0 , NS 4.7, and Moz 1.1a.

        (Similar code with characters outside the range of Latin-1 also works on both, though the browsers sometimes display the "no glyph for that" glyph (open box for IE, "?" for NS/Moz).

        Couldn't tell you what JS versions each browser actually uses, though.
      • I have no idea who decides what is officially JavaScript. I'm imagining an oracle sitting on a subway platform somewhere, eating a corndog and spouting off ziggyisms to anyone who will listen.

        But, I'm assuming that IE will just use whatever version of JScript you happen to have installed on your machine. And, as far as I know, JScript really does follow the ECMAScript specification, which is a real spec, with standards bodies and the whole works, unlike "JavaScript", whatever that is, exactly.

        Anyhow, take a look here [microsoft.com] to get a look at some of the features of the JScript interpreter hosted in some of your favorite applications.
      • Any idea what version of JavaScript IE6 emulates, and Mozilla actually uses?

        IE6 doesn't emulate JavaScript. It uses JScript, which is Microsoft's implimentation of the ECMA-262 Edition 3 language standard (ECMAScript). Similarly, JavaScript is Netscape's implementation of the same standard. Neither is "emulating" anything.

        You can find the ECMAScript standard here: ECMA-262v3 [www.ecma.ch]. You can discover what your favorite vendor has actually implemented by visiting either mozilla [mozilla.com] and microsoft [microsoft.com] documentation for each vendor's implementation.
    • Here is something that will convert:
      function tounicode(instr) {
      len = instr.length;
      switch (len) {
      case 1:
      return instr.charCodeAt(0);
      case 2:
      return new String(instr.charCodeAt(1)) + new String(instr.charCodeAt(0));
      case 3:
      return instr.charCodeAt(2) + instr.charCodeAt(1) + instr.charCodeAt(0);
      case 4:
      return instr.charCodeAt(3) + instr.charCodeAt(2) + instr.charCodeAt(1) + instr.charCodeAt(0);
      }
      return "";
      }

      document.write(tounicode("\u002d") + " " + tounicode("-") + "
      ");

      With this you can take a string like "fooo" with a unicode equivalant.

(1) Never draw what you can copy. (2) Never copy what you can trace. (3) Never trace what you can cut out and paste down.

Working...