Forgot your password?
typodupeerror
Programming IT Technology

Programmer's Language-Aware Spell Checker? 452

Posted by kdawson
from the hands-off-my-camelcase dept.
Jerry Asher writes "Not all of my coworkers are careful about spelling errors. Sometimes this causes real embarrassment as spelling errors creep into software interfaces. Does anyone know of spell checkers for programming languages? I don't want a text spell checker, I want a programming-language-aware spell checker. A spell checker that I can pass all of my code through and will flag spelling errors in function names, variable names, and comments, but will ignore language keywords, language constructs and expressions, and various programming styles (camel code, or underscores, or...). I want a spell checker that knows that void *functionSigniture(char *myRoutine) contains one spelling error. Does anyone have such a thing for Java or C++? Are there any Eclipse plugins that do this?"
This discussion has been archived. No new comments can be posted.

Programmer's Language-Aware Spell Checker?

Comments Filter:
  • by pringlis (867347) on Tuesday September 04, 2007 @05:16AM (#20461575)
    The version of Eclipse I run, Eclipse WTP 3.3, does spell checking on comments as standard. Not for variable, function names and the like though. It's a decent first attempt though. In truth, I turned it off within the first few hours. It underlines any mistakes in red which I find really annoying when scanning code as I keep thinking I've seen syntax errors. More often than not my eyes are drawn to a spelling mistake, which in many cases isn't even really a mistake, which distracts me from what I'm actually trying to look at.
    • by Anonymous Coward on Tuesday September 04, 2007 @08:03AM (#20462579)
      Man Dies Waiting for Eclipse to Launch

      A software engineer in San Jose, CA was found dead at his desk yesterday, apparently having died while waiting for his Java editing program, Eclipse, to finish its boot process. Coworkers say the engineer came in that morning vowing to "get Eclipse working on his box or die trying." The last thing anyone heard him say aloud was the cryptic comment: "I see the splash screen is appropriately blue." Nobody knows what he meant. The man was then thought to have fallen asleep, but hours later it was discovered that the engineer had died suddenly of apparent natural causes. The forensics team's investigation that evening was reportedly interrupted unexpectedly when the dead man's Eclipse program suddenly finished launching. The team tried to interact with it to see if they could find clues about the man's death, but the program was unresponsive and the machine ultimately had to be rebooted. At this time, the police commissioner says there is no evidence of foul play, and they currently believe the man simply died of either boredom or frustration.
    • Spell checkers are fine but they make mistakes as well. The best thing I have found, and this goes for any project, software or printed word, is to have someone who is not connected to the project or better yet not even connected with the subject proofread what the public sees. They will often catch mistakes that jump off the page but people working on the project just don't notice. I have made some really stupid mistakes that I never saw but were on the cover of a book I was publishing. I am SO glad it was
  • Visual Assist (Score:3, Informative)

    by Anonymous Coward on Tuesday September 04, 2007 @05:16AM (#20461583)
    Visual Assist for Visual Studio does this.

    Next silly question, please.
    • by nietsch (112711) on Tuesday September 04, 2007 @06:23AM (#20461947) Homepage Journal
      Please don't use the names of the tools the beast of Redmond uses to stupify the world. This is /. after all, if you have to code on/for windos, then please be humble and shy about it.
    • Re: (Score:3, Informative)

      by lanthar (962279)
      It is important to note that with a large code base, Visual Assist is noticing any time you have a variable or function name that it can't find anywhere else, and highlighting it with the red underline. This is in addition to turning the various keywords, macros, #defines, class names, function names, member variable names, nonmember variable names, and other such things all into their own colors. Granted, if you misspell something everywhere, then it will highlight correctly, and not indicate a problem.
  • by uucp2 (731567) on Tuesday September 04, 2007 @05:18AM (#20461589)
    Some people call using it a "code review". If you are really serious about it, post the code to /. - plenty of people here seem to have time to point out any spelling errors.
  • .... that if you want your code to read like english, you consider a language like COBOL? Not that it would help you with spell checking, per se... but if one is going to be so pedantic about making sure that their procedure names can be found in an actual english dictionary why not go the whole 9 yards and write the whole program that way?
    • by DarkSkiesAhead (562955) on Tuesday September 04, 2007 @05:41AM (#20461719)

      if you want your code to read like english, you consider a language like COBOL? Not that it would help you with spell checking, per se...

      Responses like this entirely miss the point of the question. Same with the "just review your code" responses. It's not a matter of making the language more readable. It's a matter of making the code more usable. Certainly, correct spelling is pointless without other elements of good code practice. However, bad spelling can add a lot of frustration.

      I joined a project which already had a few misspelled class names. I'm a fast typer and often I've typed out more of a filename than is spelled correctly before hitting tab to complete the name. Needless to say, I've been trained to hit tab earlier for a few choice files. But it's certainly been an irritation. Similarly, I've been confounded more than once when a function or variable couldn't be found by the compiler, only to realize that I'd spelled a word correctly rather than how the actual name was spelled.

      We choose to use English words for our class, function, and variable names for a reason. That reason is mostly defeated by misspelling the English word. A dictionary is a great idea, even for coding languages that don't "read like English".

      • by Splab (574204) on Tuesday September 04, 2007 @05:49AM (#20461759)
        Also people tend to miss that our brains are very good at correcting spelling mistakes as we read, doing code review trying to catch spelling mistakes can be very tough.
        • our brains are very good at correcting spelling mistakes as we read
          But they're not so strong at spotting jokes, it seems.
        • Quite Ture^h^h^h^h^h True. Most of the time when I cam coding and I need to copy a variable name or a quite I just copy and paste it. Making all the spelling misstates consistent throughout the code.

          eror: 312 varible naim mispelled
      • by MythMoth (73648)
        Hear hear.

        Incorrect spelling in code causes all sorts of minor confusion - I'd love an Eclipse plugin to address this.
      • by evanbd (210358) on Tuesday September 04, 2007 @06:21AM (#20461935)

        It strikes me that the problem is that most spell checkers try to check everything, and that a lot of code has things that really shouldn't be spell checked at all, mixed with things that should. I imagine that one way to start would be to only alert on those errors that are almost correct -- if it looks like garbage, ignore it, but if it's close, assume it should be right. Perhaps ignore prefixes / suffixes as well -- pSomething is fine, pSometihng isn't. Also, CamelCase ought to be easy enough to detect -- treat it as word boundaries, and spell check the individual words. Again, egregious misspellings probably aren't -- nextObjFoo is ok, even though Obj isn't a word -- it's so far from being a word that we assume the programmer meant it that way.

        Similarly, there should probably be a set of words added that aren't "English" but are used often enough to be worth adding to the dictionary. Things like Obj, Int, and Ptr.

        I think the reason such spell checkers don't exist already is fairly simple -- everyone just assumes they're impossible, and doesn't try. Couple that with the fact that a mediocre quality one would be so annoying as to be worse than useless, and you have a recipe for a program that won't get written. I don't think either of those would have to be the case if someone sufficiently clever decided to attack the problem, though.

      • Similarly, I've been confounded more than once when a function or variable couldn't be found by the compiler

        Why don't you right click on said function and choose Refactor -> Rename, or whatever the equivalent is in your IDE?
  • by PhrostyMcByte (589271) <phrosty@gmail.com> on Tuesday September 04, 2007 @05:25AM (#20461629) Homepage
    And not too hard to implement - all you need is a lexer and a few functions to classify different naming styles. lexertl [benhanson.net] even comes ready with a full example for C++, so get to it ;)
    • by Xiph (723935)
      This could possibly make it a lot easier to change between different naming conventions.
      possibly even do cross convention linking.

      I now have a new uni-project i care about :) Though it would be a shame to end a perfectly good flamewar...

      maybe i should implement it in emacs.
    • by Gordonjcp (186804)
      Isn't that basically what syntax highlighting does anyway? A simple heuristic for a programming spellchecker would be "Did I highlight it? No? Well I'll spellcheck it, then" with a suitable syntax dictionary and language dictionary. Initially certain words wouldn't show up in the language dictionary, but you'd add "setDictPath" when you define it. A quick bit of code review to make sure that no mis-spelt words get into the dictionary, and you're done.
  • Anything that may appear in a user Interface should be kept in dedicated files. Use a standard format such as CSV, XML...It may be reviewed by non-technical people with built-in spell checker software such as excel. This is a trick mainly use for multilanguages project, but it really helps.

    • by Kjella (173770)
      ...which is an interesting problem, but really a different one. If you look at his example, he's talking about botched function names etc. which really is code. User-visible text is usually spellcheckable in whatever format is being used to do translations. And if you hardcode English everywhere, then well... you're probably not interested in this kind of quality-enhancing tools to begin with.

      That said, I rarely see this as a big problem unless it's a very static internal interface. Pull the whole code tree
      • Re: (Score:3, Informative)

        by Roofus (15591)
        If you maintain a library that is used by customers, that would be a *very* big issue. You just broke backwards compatibility for a spelling fix.

        Overall, the answers to the submitters question are absolutely horrible so far. If the tool he's searching for doesn't exist, it damn well should.
        • Re: (Score:2, Redundant)

          by pasamio (737659)
          The solution to this is simple, you use the Microsoft approach which is to keep old names of the functions around and wrapper them to the new names in the hope that people will start using the new name instead of using the old name. That said it shouldn't have been a problem anyway. I have translators who email me reminding me my English has mistakes from time to time (last one was writeable/writable, writable is supposed to be the right one but both come up as spelling errors in Safari anyway) so having a
  • vim 7.0 anyone? (Score:3, Insightful)

    by Janek Kozicki (722688) on Tuesday September 04, 2007 @05:28AM (#20461647) Journal
    I particularly like the spelling feature in new vim, right-click menu (:set mousemodel=popup) to select a corrected word or remember current word as correct. Perhaps writing a vim plugin as you explain could be possible? I'd be very glad to use it too ;)
    • Yep, me too. Especially I like that it is aware of comments and text zones and spell checks them as you write the code. It can also check the spelling of function names, if you use underscore_to_separate_words. As for a code spell checker, it is pretty good.
  • A small script to split up camelCase into seperate words, then feed the result through a normal spell checker. Then after that just whitelist certain words like maybe "m" as found in "mSomeVariable".
  • by YeeHaW_Jelte (451855) on Tuesday September 04, 2007 @05:33AM (#20461667) Homepage
    We've got code here that refers to 'insurrances', 'insurances', 'insurrences' and 'insurences', I'm not kidding.

    People here making fun of his request and saying that this should be set in stone in design documents, or be checked in peer code reviews are obviously not working in a run-of-the-mill software company where there's neither the inclination nor the time to do everything the formal way. Also, I have to see the first design document that correctly enumerates all the requirements for the software, let alone all the names for the variables to be used.
    • by Corporate Troll (537873) on Tuesday September 04, 2007 @05:48AM (#20461755) Homepage Journal

      As a non-native English speaker, working in a non-native english speaking team (mainly french speaking people) it is a real problem. The biggest problem happens when you search something and don't find it because you wrote it right and your coworker wrote it wrong. (Or the inverse, I don't claim to be perfect in English)

      Sure, you might say, "Write your code in French", but that's not a solution. My mother tongue is Dutch, we have a German coworker, and you never know if the next guy will be Italian. There is also this team that has to maintain code written by Spanish people.... in Spanish.... and they don't know Spanish. Fun times, if you like to hear them curse....

      In multilingual environments this problem increases drastically.

  • I am currently working on a java-based universal spell checker (the kind that can do a decent job without involving knowledge of that language). By language, I mean, English, Hindi etc.
    I am amused by the idea of being able to extend that to programming languages.

    The most significant problem that I am facing has nothing to do with coding the spell-checker. Its about getting a sizable dictionary of words (finding one, converting to UTF-8 etc.)

    The trouble is that programming comes with a very different set of
    • by mwvdlee (775178)
      In a normal development shop, you could probably make such word lists on the fly.
      Just appoint one "spelling guru" who is the only one allowed to edit the list and add new words to it as you go. It's probably better to strictly manage the list anyway, considering the various number of ways some words can be written, i.e. color vs. colour. You'd probably want only one of those variants allowed.

      You could probably use a simple lexer to detect function and variable names and (customizable) regex to extract the c
  • FxCop (Score:2, Informative)

    by W3bbo (727049)
    Okay, so it's only for Managed Assemblies (C#, VB.NET, J#, etc), but it does spell-checking, acronym-checking, and case-checking, which is nifty. Along with the other slew of introspection rules (some of which are a PITA to implement, even if it does increase the quality of the finished product).

    The $$$ version of Visual Studio (the Team Suite version) comes with an introspection engine for VC++ though, it's not as flexible as FxCop but does the basics.

    Then there's the countless "Spellchecker" plugins avail
  • by NNKK (218503) <nknight@runawaynet.com> on Tuesday September 04, 2007 @05:44AM (#20461735) Homepage
    TextMate on OS X has spell checking functionality that is semi-useful, but it's not really "aggressive" enough, and there doesn't seem to be a way to make it such with prefs/configuration.

    You can right-click on any "word" (variable name, subroutine name, whatever, just generally a whitespace-delimited group of characters) and it will check the spelling and present alternatives in the context menu. It also recognizes things like perl's sigils so correcting '$teh' turns into '$the', not 'the'.

    It _won't_ automatically check spelling except in strings (so e.g. if I have '$teh = "This is a tset.";', 'tset' will be underlined, '$teh' won't). It doesn't include comments in its automatic checking either, which is probably the most annoying part about it.

    Overall I typically just don't bother with it, but someone _has_ thought along these lines, at least.
  • Cos I don't know what a Space compnay is.
  • aspell? (Score:4, Insightful)

    by atlep (36041) on Tuesday September 04, 2007 @06:22AM (#20461941)
    I remember from spellchecking some html documents a while back ago that aspell is at least aware of html. I do not know how well it works with other kinds of documents.
  • The idea is nice and I think the problem is really prevalent. I have seen large portions of source code, much of it commercial, containing not one or two but hundreds of spelling mistakes. I also believe the problem must be more prevalent in closed source and in small businesses than open source and Free software. Another thing is that developers from countries with non-English languages often mix English with their first language in code, making it hard to maintain by other nationalities.
  • Well, I'm a total newbie in terms of compiler architectures and such, but throwing it out there for the purpose of discussion...

    I assume a compiler will parse the source and in the process identify which tokens are key words and literals, and which are programmer-defined identifiers in the code. The spell checker would either use the same algorithm, or latch into that part of the algorithm to get at all of the identifiers. There are two possible word separators in typical code--either capital letters or u
  • by starwed (735423)
    Wow, I'm glad there's such agreement between those who read the article and left comments, and those who just tagged it. :)
    I tagged this article badlytagged
  • How about this (Score:5, Interesting)

    by Ed Avis (5917) <ed@membled.com> on Tuesday September 04, 2007 @06:48AM (#20462093) Homepage
    Yes, this is a legitimate problem. I work on code that has spelling mistakes embedded into interfaces and it's very annoying. The fashionable use of StudlyCaps in programming (why? who decided that TextLikeThis is more readable than text_like_this?) makes the job a little harder but not impossible, as long as you follow the sane rule of making each word start with capital and continue lowercase, even if an acronym (so XmlParser not XMLParser or, God forbid, XMLparser - though of course XML_parser would be better than any of those).

    Enough rant. How about this:

    perl -ne "s/([a-z])([A-Z])/$1 $2/g; tr/A-Za-z/ /c; foreach (split) { print qq{$_\n} unless $seen{lc $_}++ }" source_file...

    That will give a list of unique words in your source code (use find and xargs to scan the whole source tree). Then you can run that list of words through an ordinary spellchecker such as ispell. Unfortunately when you find a mistake you have to go back and grep for it to find where it occurs. You would also need a personal dictionary for things that are not English words but nonetheless appear in code.

    I would probably keep the private word list containing things like 'foreach' and 'const' with the program source code, and have a makefile target 'make spellcheck' that runs a command like the above and then prints out all words found that are not in /usr/share/dict/words or in the private word list. Indeed, why not this:

    find . -type f -name '*.c' | xargs perl -ne "s/([a-z])([A-Z])/$1 $2/g; tr/A-Za-z/ /c; foreach (split) { print qq{$_\n} unless $seen{lc $_}++ }" >found_words
    sort -u private_word_list /usr/share/dict/words >allowed_words
    diff -u allowed_words found_words | grep -E '^[+][^+]'

    The private word list can be kept under version control and checked in whenever you add a new non-English word like 'Frobule' to your source code.

    Adding filenames and line numbers to the output is left as an exercise for the reader. You might also want to change the perl command to ignore words with length < 5.
    • who decided that TextLikeThis is more readable than text_like_this?


      I suspect that they actually decided that TextLikeThis was easier to type, and sufficiently readable that the typing ease benefit was worth the switch. Of course that's because no one thought of making shift-<space> map to _.
  • FxCop (Score:5, Informative)

    by Koyaanisqatsi (581196) on Tuesday September 04, 2007 @06:58AM (#20462157)
    For .net languages, FxCop does some of this checking, even understanding camel casing and underscores in tokens. And a bunch more, since it is a static code analysis tool.

    http://www.gotdotnet.com/Team/FxCop/ [gotdotnet.com]
  • Visual Assist (Score:3, Interesting)

    by soundman32 (147936) on Tuesday September 04, 2007 @07:02AM (#20462179) Homepage
    Doesn't Visual Assist from Whole Tomato do this? I've used it in the past and I'm sure spelling mistakes (and a whole host of other things) were pointed out.

    I'm not associated with Whole Tomato, but if anyone from WT sees this, can I have a free subscription :-)

    • by shird (566377)
      It points out spelling mistakes in "strings" but not variable names. ie, it won't point out that the variable lAnsIdx is spelt incorretly, like the submitter is asking for, that would be just stupid.
      • Re: (Score:3, Interesting)

        by gnasher719 (869701)
        '' It points out spelling mistakes in "strings" but not variable names. ie, it won't point out that the variable lAnsIdx is spelt incorretly, like the submitter is asking for, that would be just stupid. ''

        Comments like this make me wonder. Is it so hard to imagine a spelling checker for say the C language that finds words that were not written the way they were intended? Limiting yourself to correct English words for identifiers is stupid. Assuming that a spelling checker for a programming language would do
  • Nothing personal, but it's not actually a programmer's job to make sure everything is speelled correktly. This is part of the QA process before a product rolls out the door. Sure, you should do your best, but you need another pair of eyes (or several pairs of eyes) looking at the UI in addition to your own. You can easily miss the forest for the trees.

  • FxCop (Score:2, Redundant)

    by soundman32 (147936)
    I'm sure you aren't using .NET, but if you were, FxCop will check for spelling mistakes in code and comments and strings, along with 1M other coding issues (like malformed variable names, parameters).
  • I had your problem once because I was working with people whose first language was not english. I don't write US English either and I always left English spellings in by mistake.

    I used aspell and went through huge parts of the source, telling it what wasn't misspelled. It was an incredible pain in the neck because it got confused over all the variable names, bits of C syntax etc etc.

    Once I had a dictionary, though, I could recheck the source periodically and although there were a lot of false warnings, we
  • Annoying perhaps but (Score:5, Interesting)

    by Taagehornet (984739) on Tuesday September 04, 2007 @07:47AM (#20462451)
    True, identifier names containing spelling errors can be a real annoyance, but I somehow doubt you'll ever find a usable solution, at least not as long as you'll need to interface to code beyond your control. What spell checker wouldn't choke on regular C++? Just picking a random declaration from MSDN (feel free to choose any other API, it won't change anything):

    HRESULT MFGetService(
    IUnknown* punkObject,
    REFGUID guidService,
    REFIID riid,
    LPVOID* ppvObject
    );


    You'll probably just end up spending all your day removing false positives.
    • Re: (Score:3, Insightful)

      by Fnkmaster (89084)
      Just looking at that declaration makes me want to claw my eyes out. Bad C++ is almost (but not quite) as bad as Perl.
  • by Maximum Prophet (716608) on Tuesday September 04, 2007 @11:38AM (#20464801)
    Wow, 240 comments about spelling and programming and no-one's mentioned the famous Ken Thompson quote:

    "If I had to do it over again? Hmm... I guess I'd spell 'creat' with an 'e'."
    • Re: (Score:3, Interesting)

      by 808140 (808140)
      This is completely off the top of my head, but do you remember how early C compilers used to only recognize the first six characters of a function name? So, for example, create_foo() and create_bar() were recognized the same way.

      Now, in essentially every program in the world there is a function named 'create_something' or alternatively 'createSomething'. Had Ken Thompson's creat() function been spelled create(), early C compilers would have treated them the same way, thus making any function starting with
  • by SuperKendall (25149) on Tuesday September 04, 2007 @11:42AM (#20464871)
    Still a great IDE after all these years...
  • I'm sorry... (Score:3, Interesting)

    by DragonTHC (208439) <Dragon@gamerslaC ... minus physicist> on Tuesday September 04, 2007 @04:12PM (#20468967) Homepage Journal
    If you are too damn lazy or too stupid to type your language properly, then you shouldn't be a programmer. Become an insurance adjuster or something less demanding.

    I don't think I'd like to hire someone who can't spell. It shows volumes about you.

    Intelligence starts with a keen understanding and application of your language.

    if you simply must have it, editplus has syntax highlighting and offers spellchecking dictionaries.
  • by sohp (22984) <snewton AT io DOT com> on Wednesday September 05, 2007 @12:44AM (#20474821) Homepage
    Yep. Programmers should know how to spell correctly in their native language. But hey, all through school those technonerds where likely the same ones who never missed a chance to whine about how they hated their English (or whatever) classes and thought that learning grammar and spelling were a waste of time when they could be doing cool geek stuff. The rise 1337-speak and txtspeak hasn't helped.

    At least in the real writing business there are editors trained and paid to catch these errors.

    Being unable to spell correctly makes you look really stupid to most people.

    Just FYI, if you have a decent programming environment, it should at least flag cases where you've mistyped an existing identifier. If there's an ImmediateFlag in your code, you'd get a warning if you typed ImediateFlag or ImmediateFalg or whatever. Not much help when the programmer is creating new identifiers, of course. Although I've seen cases where the programmer in question for whatever reason decided that because ImediateFlag was undefined then they would just define it, even though ImmediateFlag existed and was what they meant. That ought to get you fired in my book.

    Hey by the way, pair programming is a great way to have continuous code reviews and a check on some of the more typical fumble-finger errors.

Machines certainly can solve problems, store information, correlate, and play games -- but not with pleasure. -- Leo Rosten

Working...