Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Reversing a Checksum Algorithm? 35

Todd asks: "Does anyone have good suggestions or advice regarding techniques for determining the checksum of a serial data protocol? I am currently working on a fun little project using a digital display sign which uses an unknown serial protocol. I cannot move on with the project until I figure this dreaded checksum out."
This discussion has been archived. No new comments can be posted.

Reversing a Checksum Algorithm?

Comments Filter:
  • did a search.


    it's not these guys right? (you didn't say what phone # you tried so...)
    CYBERNETIC DATA PRODUCTS
    144 NASSAU BLVD
    WEST HEMPSTEAD, NY 11552-2216
    516-489-2800
  • by psavo ( 162634 ) <psavo@iki.fi> on Saturday July 20, 2002 @07:20AM (#3921856) Homepage
    really.
    I don't think you have much choice but brute-force it.
    Collect as many as possible checksum -generators, and try them on portions of message.
    Some more-or less trivial are 'LRC', CRC, CRC16, and CRC32 ones. Of course XOR too. difficulty is to determine which characters go to checksum, as it's not always that 'STX'/'ETX' are included.
  • If you also have the product whose protocol you are
    trying to emulate, let someone in a free country trace
    it with a debugger (gdb, WinICE, Turbo Debugger...)
    in order to find at least the assembly code out.

    Then you can hire an assembly guru (like me :)
    to write a routine which does the same.
  • Need more info (Score:4, Interesting)

    by heikkile ( 111814 ) on Saturday July 20, 2002 @09:42AM (#3922110)
    It all depends what you have access to. Can you capture a number of packets with a valid checksum? Can you feed your own guesses to the box, and get a decent feedback? Do you know the rest of the protocol? Do you have access to the drivers or other software (in either end)?

    Is it a very simple machine with a very simple processor and limited memory? If so, they may well have written their own with shifts and xors. On the other hand, if it is a modern processor with megabytes of memory, they probably have used a library function. Can you get access to the standard libraries for that chip?

    How long is the checksum? What happens if you feed a wrong one? If it is just a single byte, and the box doesn't lock up completely on error, try all possible values for every packet!

    Why do you think they put the checksum in the protocol? To catch a few dropped bits over a bad line, or to prevent tampering? If the later, watch out for some real cryptography. If only the former, assume they took the path of least resistance.

    • By reading the page in question, here are the answers to a bunch of them:
      • Yes, he or anyone can capture a number of packets as they travel to the serial port, and there's a big list on his site.
      • The protocol is pretty much figured out, atleast the dynamic parts.
      • There is access to the software on the PC end and the signs ROM image
      • It is a simple 6502, so very limited indeed.
      • The rom image is 32k with about 16k used data, so there are probably no libraries in there.
      • The checksum is two bytes.
  • by oliverthered ( 187439 ) <oliverthered@hotmail. c o m> on Saturday July 20, 2002 @09:54AM (#3922146) Journal
    you said you have a DOS application that generates the data.

    Well run it in a debugger and locate the part the does generates the checksum, you could rip the code out directly(or load the application up into memory and call the checksum bit), this will probably violate a few laws though.

    So the best way is use there code to work out the algorythm and then write you own code(even better get someone else to do it) based on the algorythm
  • While looking at the test values that you sent to the sign, I noticed something about the 13-14 bits. Without more test cases, you can never be sure.

    The 13-14 bits look like they change with the length of your message. With messages of 'TEST' or 'WICK' the value in 13-14 is 'DA BC'. With a message of 'T', the value in 13-14 is '5A BA'.

    Neat project - keep us posted on the progress!
    Hope this helps.
  • Though it was years and years ago, I'd done some 6502 assembler programming. In fact, I just pulled out my copy of "Programming the 6502" by Rodnay Zaks (Sybex Inc.). I was never that good at it, but I had a lot of fun trying!

    Unfortunately, when I tried to download the binary ROM image [dickwick.com] of the 6502 code on his web page I got an error: Cannot find the file specified.

    If/when that gets fixed, the next step would be to get a computer do the work. A quick check of google [google.com] using the search string: "6502 disassembler" [google.com] brought up about 337 matches. Granted, some of the disassemblers are designed to run on older hardware (e.g. Atari, Commodore 64), but I saw links for unix, windows, and DOS, too.

  • My process (Score:2, Interesting)

    by hackwrench ( 573697 )
    looking at your page http://www.dickwick.com/sign/sign.htm [dickwick.com] The first three values differ from each other by 15 and the next differs from the others by a multiple of 15, the only number from this set that is missing is 61, which would appear to might have been generated by 0 if that was a valid value. The ones with bit seven set are also 15 apart but offset 1 from the first group. If you keep repeating the process for other bytes you should find a connection with every bit postition/group and the resulting values of the checksum.
    • It seams that your best bets are (a) disassembly of the relevant code, and/or (b) application of differential cryptography.

      If you can sniff good packets, you can compare slightly different ones. If you can't sniff good packets, you can try generating packets exhaustively to find good ones, and then attack those differentially.

  • Use a lot of test cases. In fact, try everything. Start with a message length one, and send all possible messages. Then try similar, with a length of two. You won't need to try literally every combination, but assuming it's not super complicated, you ought to be able to get it from this.
  • First, get a good debugger. An old DOS version of SoftICE would do the trick.

    Next, set it up to send a test message, like "test 12345". Then set a breakpoint in the code that starts transmitting to the serial port. That shouldn't be hard to figure out with some old DOS references.

    Then run the program, and when the breakpoint hits search memory for your test string. Then you find the memory location of the buffer being sent by the application. Then set a memory breakpoint so the debugger will break when that buffer is read, and continue the program and have it send the message again. Now you should be able to zero in on the crc code, as it reads the buffer to compute the checksum.

    Once you find the loop of asm code that's computing it, you should be able to understand the algorithm pretty easy.
    • Old DOS (few programs, anyway) used to use INT 14h to transmit bytes over the serial port. You can search for this opcode in your debugger (look for the bytes CD 14).

      Set up a breakpoint there or something; the register AH is 01; AL is the character that's being written; and DX={0=COM1, 1=COM2, ...}

      (hope you like assembly!)

      • Good software NEVER used that function as it was next to useless.

        ah, but those were the days, defining what io port and irq a serialport was on to every application that used it...
        • But did he explicitly say this software is "good"? ;)

          I scanned thru the EXE file myself for an occurence of this function call; I did find one pattern, but couldn't determine what program segment it was is, if it's actual code, etc. Still trying to get dos's old DEBUG.EXE working under wine (and don't want to install windows again)


  • Since most of the good advice I can think of for reverse engineering the check sum has already been posted, I'll suggest something orthogonal:

    You might consider black-boxing the original app using DosEmu and glue code of your own devising. Stuff goes in, stuff comes out, and you're done.

    -- MarkusQ

    P.S. I'm assuming, of course, that this is just a one-off project, not something you plan to replicate.

  • You can run the DOS app emulated (using dosemu) or interpreted (using Bochs). You can then automate this by using something like "expect" to simulate typing at the simulated DOS window.
  • Input! More Input! (Score:5, Insightful)

    by cookd ( 72933 ) <.moc.onuj. .ta. .koocsalguod.> on Saturday July 20, 2002 @08:20PM (#3924505) Journal
    I played around with the sample data you provided. I came across some funny business which I am almost certain is an error in your spreadsheet, which throws the whole thing into doubt. Did you mean to repeat the slot encoding 0x05 for both slots 5 and 6? I'm going to assume it was a typo. If so, I learned the following about the impact the slot number has on the checksum:

    8-bit-number
    Bits == 76543210

    Csum(x) is the value of the checksum with slot #x and all else held constant.

    Csum(x) = Csum(0) +
    (Bit0(x) ? -0x0F : 0) +
    (Bit1(x) ? -0x1E : 0) +
    (Bit2(x) ? +0x3C : 0) +
    (Bit3(x) ? -0x78 : 0) +
    (Bit4(x) ? -0xF0 : 0)

    Note that the values added/subtracted for each bit follow a pattern:
    0x0F = 00001111
    0x1E = 00011110
    0x3C = 00111100
    0x78 = 01111000
    0xF0 = 11110000

    More data might shed some light on the pattern. Whatever the case, I think this is reminiscent of a CRC16, since I don't think a checksum would have this kind of behavior -- a standard addition checksum or a XOR-based checksum (even with bit rotation) would make a bit always add/subtract the same amount, but it would be a power of 2 (I think).

    So now you need to find the CRC's polynomial, and I don't know enough about that kind of thing to help you. (And there is a chance that everything I've said here is wrong, since this is not my specialty!)

  • Most of the checksum algorithms employed in that era were extremely light weight and not at all good by crytographic measures. The 6502 has a register set that makes x86 look positively beefy by comparison.

    This could work against a simple reverse analysis. The algorithm might be table based to alleviate register pressure, and it's clear to me that the topic author is not up to the task of fleshing out such a table by differential techniques. If he had that kind of talent, he wouldn't be asking us.

    On the other hand, working with a tool such as SoftIce, it's not at all difficult to capture the checksum code red handed. I used to play Falcon 3.0 under SoftIce so I could repair the system crashes on the fly (especially the guaranteed hang when you hit a ground target with one specific type of munition). "R2, see what you can to about the power."

    And yes, I bypassed many kinds of crashes and checks using SoftIce. It was far easier than you would think and I enjoyed the small challenges as much as any game I played. Differential analysis is on a par with learning to play Go because you enjoy of the humility of probing your boundless incompetence.

    At the other extreme, the 6502 instruction set can be learned in a couple of hours. People forget how simple things used to be. The 6502 is a four function calculator on steroids. The average configuration file these days (Apache, SMB) takes longer to master.
  • Don't start with messages like "test". Send things like AAAAA and see what actually comes out. Find the lowest char it accepts, probably space (somewhat useless), maybe bang, maybe digits. But figure out the lowest it accepts, and preferably as few bits as possible, such as @ 0x40. Reset it and send strings of this, see how they show. You do need to try lots of variations. Send zillions of AAAAA and see how many other chars show before it repeats.
  • Ask the VP? (Score:4, Insightful)

    by Dahan ( 130247 ) <khym@azeotrope.org> on Sunday July 21, 2002 @05:02AM (#3925432)
    Have you tried emailing this guy [boshear.org]? According to his resume, he worked at SilentRadio from 1980-1997, eventually ending up as vice president. And if he doesn't know the answer, maybe he can get you in touch with Mike Levin, the guy who started the company.
  • Solved it (Score:5, Interesting)

    by sepulcrum ( 161180 ) on Sunday July 21, 2002 @08:52AM (#3925677)
    I put sign.exe thru IDA [datarescue.com] and identified the checksum algorithm. I found out that the only thing that goes thru the checksum is 35 35 00 then 0E DA is skipped and then the rest is put thru. The algorithm is a simple crc alike algo that adds the chars xors with the length and rotates some bits. You can find a perl program i wrote to calculate the checksum for a given range at: this location [gewis.nl].

    Good luck with your project,
    Gijs
    • by fizbin ( 2046 )
      Heh - I was just working on this exact same thing after finding someone's post on piclist looking to reverse-engineer the same program. And to think, you only beat me to the solution by a few hours...

      Except that you didn't, completely. The trickiest part of this checksum is that it shows up TWICE in the packet format. As you state, the checksum is initialized after the ff attention byte, and then most of the packet is run through it. However, looking at the initial packet:

      FF 35 35 00 0E DA BC 01 00 00 08 00 19 02 54 45 53 54 1F 8C 52
      ^^^^^ ^^^^^

      both of the indicated portions are checksums. The first checksum covers the string "35 35 00 0E" and the second covers the string "35 35 00 0E 01 00 00 08 00 19 02 54 45 53 54 1F". The checksum is not re-initialized after it's output. Incidentally, here's my perl version - it uses bit operations in preference to the %-heavy code that you use, but it's the same algorithm:

      #!perl
      $check = 0xa55a;
      $sum=0;
      @bytes = (@ARGV);

      for $pos (0..$#bytes) {

      $sum += (hex($bytes[$i]) ^ ($pos & 0xff));
      $sum = $sum & 0xffff;
      $check += $sum;

      $check = (($check >> 1) & 0x7fff) | (($check & 1) << 15);
      }
      printf "%04x\n", $check;
      __END__
  • by vidnet ( 580068 )
    ..must probably be to change the sign's rom image. Flip the compare instructions and hope for the best.

    I'm no asm programmer (esp not 6502), but I'm guessing it's (counting from zero at start of rom)
    (02A1) BEQ 02B0 and
    (02AE) BEQ 02C5
    that should be changed from BEQ to BNE (branch equal/not equal).

    I spent some time in dosemu debug, but there didn't seem to be any simple algorithm (again, I don't know asm).

    Brute forcing a checksum algorithm is pretty much impossible unless they've used a standard one. The best bet imho is to either grab it out of the sign.exe or disable it in the sign.

  • Have you checked... (Score:2, Interesting)

    by arb ( 452787 )
    I assume you've found the following resources, but just in case...

    There's a thread the 'The LED Forum' about this sign. The people there might be able to help... http://www.eio.com/public/led/0219.html [eio.com]

    Also, one message in that thread is for a message at SignIndustry.com http://www.signindustry.com/mb/read.php?f=19&i=138 &t=69 [signindustry.com] which might also provide some people who could help...

    While the information you are after is not on these boards, at least you will be able to contact some people who have the boards - maybe they've found some more information.

    Other than that, I can only repeat what a couple of other people have said here - do some very methodical testing, starting with all single character messages from a-z, A-Z, 0-9 and punctuation. Then try a fair range of two character messages with the same options as the first run. (aa-az, aA-aZ, a0-a9, ba-bz, etc, but there's no real need to try ALL combinations.) Then try a fair sampling of 3 char messages. Next try some longer strings, modifying one character at a time and the same string modifying each property. After that, try a range of single characters (say 10-15 different characters) with each of the options. Finally, do some 2 char, 3 char and a longer message with a reasonable set of different options.

    If you gather a truck-load of data (several hundred sets, not just 100) you stand a better chance of figuring it out. You really need the trivial cases (single and two character strings) and some non-trivial cases (a phrase, not just 'TEST') to make it easier to reverse-engineer the algorithm.

    As others have mentioned, given the time frame and the fact that it is 6502-based, the algorithm is not likely to be particularly complex, but working it out can take a lot of time!

    Mean-while, I might have a look at the 6502 code and your test data to see if anything jumps out at me...

    Good luck!
  • I've already emailed you, but for the rest of the slashdot crew.... A friend and I solved this for our sign about 1.5 years ago.

    http://www.kuci.org/~jburley/LED/ [kuci.org]

    Hope that helps!
  • google said there is a crc reverse engineering tutorial at http://www1.lunarpages.com/biw/tuts/crctut1.htm [lunarpages.com] (oh look, it's written by the "c00l guy" "by anarchriz "... ;). didn't look at it, but it at least should give some insight into general CRC strategies.
  • First of all, don't be listening to the guys that are telling you to try a variety of input. They've obviously never done this before.

    If the sign doesn't use one of the standard polynomial CRC algorithms, you're only real option is to disassemble the binary you have to dig out the algorithm. Even if it does uses a standard algorithm (It's probably CRC16), you'll still have to figure out what data it's using. You can guess and check, and you may get lucky. Disassembling is a sure thing, and probably the quickest way to go, but check your licensing agreement.

    I recently reverse engineered the CRC for my LG TP-5250 cell phone which, as it turns out, uses the standard CRC16 algorithm, but is creative about which data is uses to compute the checksum. Without a dissassembler it would have taken months to figure out what bytes were being used and in what order. You can read about how/what I did here [spineless.org].
    • >First of all, don't be listening to the guys that are telling you to try a variety of input. They've obviously never done this before. Bah! I have reverse engineered a few (weak) checksum / password encryption schemes simply by charting a sequential pattern of input (generally start with AAA, AAB, AAC, AAD ... it may take a dozen and it may take a thousand or so) and just eyeballing the output. Sometimes it helped to chart the output but most the stuff today's VB script kiddies put together is painfully obvious - linear processions, character switching, 'code-wheeling' and progressive 'code-wheeling'. Well anything that doesn't involve bit switching ... once they start bit smashing it takes more than just eyeballing it to figure it out. Just because PGP level encryption exists doesn't mean everybody bothers to use it - programmers come in all shapes and flavors. Try brute forcing it, sometimes you get lucky =)
      • First of all, CRC is hardly encryption. Second of all, any sort of hardware you're likely to be communicating with was probably designed by somebody with a little more math background then a VB script kiddie. We're not talking about some lameo warez serial number database, we're talking about actual professionally designed RS232 hardware. Electrical engineers understand that CRC is for detecting a variety of errors, and they don't put the checksum in there for fun or obfsucation. It really is designed to catch certain errors. It's nothing complex, either. These checksums have to be computed by almost no hardware. There's usually an adder (not even a complete adder because there's no overflow bit), a single register, and a lookup table or two. If you're lucky, the table is computed in a straightforward manner using polynomial arithmetic. If not, you have no real choice but to dig the table(s) out of somewhere. We're not talking PGP here, but the table is calculated in such a way as to avoid patterns. More importantly, the algorithm should take into account not only the values of the data, but the position of each value. Maybe one device in a hundred is designed by some idiot that uses a simple checksum.

Our OS who art in CPU, UNIX be thy name. Thy programs run, thy syscalls done, In kernel as it is in user!

Working...