Forgot your password?
typodupeerror
Hardware

Ask Slashdot: How Many (Electronics) Gates Is That Software Algorithm? 365

Posted by Soulskill
from the somewhere-between-zero-and-lots dept.
dryriver writes "We have developed a graphics algorithm that got an electronics manufacturer interested in turning it into hardware. Here comes the problematic bit... The electronics manufacturer asked us to describe how complex the algorithm is. More specifically, we were asked 'How many (logic) gates would be needed to turn your software algorithm into hardware?' This threw us a bit, since none of us have done electronics design before. So here is the question: Is there a piece of software or another tool that can analyze an algorithm written in C/C++ and estimate how many gates would be needed to turn it into hardware? Or, perhaps, there is a more manual method of converting code lines to gates? Maybe an operation like 'Add' would require 3 gates while an operation like 'Divide' would need 6 gates? Something along those lines, anyway. To state the question one more time: How do we get from a software algorithm that is N lines long and executes X number of total operations overall, to a rough estimate of how many gates this algorithm would use when translated into electronic hardware?"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: How Many (Electronics) Gates Is That Software Algorithm?

Comments Filter:
  • Verilog (Score:5, Informative)

    by tepples (727027) <tepples AT gmail DOT com> on Wednesday January 08, 2014 @03:35PM (#45901099) Homepage Journal
    If you learn to program in Verilog, you could try synthesizing for some FPGA and see how much space it takes up on the FPGA. But then programming for an FPGA differs from programming for a serial computer in that each line of code runs essentially as a separate thread, usually triggered on another signal (such as a clock) having a positive or negative edge.
  • C to HDL to netlist (Score:2, Informative)

    by Anonymous Coward on Wednesday January 08, 2014 @03:38PM (#45901131)

    As a first-order approximation, you can translate your C/C++ code to a hardware description language (HDL) such as VHDL or Verilog. Tools exist for this process. The result won't be optimized for the peculiarities of HDL, but it will provide a good start. From there, you can port the HDL to a Xilinx or Altera FPGA netlist using vendor-specific tool chains. The porting effort will summarize the logic and memory resources of your implementation. Any digital hardware engineer worth their salt should be able to translate FPGA utilization metrics into whatever platform they are interested in.

  • by Animats (122034) on Wednesday January 08, 2014 @03:42PM (#45901177) Homepage

    You need a C to VHDL translator. Here's a tutorial for one. [xilinx.com]

    Only the parts of the algorithm that have to go really fast need to be fully translated into hardware. Control, startup, debugging, and rarely used functions can be done in some minimal CPU on or off the chip. So, for sizing purposes, extract the core part of the code that uses most of the time and work only on that.

  • VHDL (Score:3, Informative)

    by Anonymous Coward on Wednesday January 08, 2014 @03:42PM (#45901181)

    Implement the algorithm in VHDL and test it on an FPGA. I would imagine you'll need to pay someone $$$$$ to do that for you...

  • by neutrino38 (1037806) on Wednesday January 08, 2014 @03:43PM (#45901197) Homepage Journal

    Hello,

    It is probable that you can break down your algorithm -(I do not mean code) into a pipeline of elementary processing and find implementations (IP) for each of them.

    to give out an estimate:
    - subdivise your algorithm into simpler pieces
    - find for each simple piece how it can or could be implemented in hardware and the complexity of each piece.
    - do the sum.

      Indeed an hardware designer or consultant would be of a great help here.

  • HLS (Score:4, Informative)

    by orledrat (3490981) on Wednesday January 08, 2014 @03:45PM (#45901213)
    What you want to do is called high-level synthesis (going from C to hardware description language (HDL) to generating gate-lists from that HDL) and there's plenty of software to do that with. A neat open-source package for HLS is LegUp (http://legup.eecg.utoronto.ca/), check it out to get an idea of what the process consists of.
  • by cjonslashdot (904508) on Wednesday January 08, 2014 @03:46PM (#45901225)
    It's about more than gates. It is about registers, ALUs, gates, and how they are all connected. There are many different possible architectures, so it depends on the design: some designs are faster but take more real estate. There are algorithm-to-silicon compilers (I know: I wrote one for a product company during the '80s and it is apparently still in use today) but each compiler will assume a certain architecture. I would recommend one but I have been out of that field for decades.
  • Accurate answer (Score:4, Informative)

    by Sarten-X (1102295) on Wednesday January 08, 2014 @04:04PM (#45901383) Homepage

    Write out the truth table for each output as a Karnaugh map [wikipedia.org] incorporating every input. Count the number of gates needed to solve the map, and that's your answer for that output bit. Repeat for every other output bit. Add all those numbers together, and that's a fair estimate of how many gates you'll need.

    Of course, this method requires that your number of input bits must be fairly small. Don't forget that memory counts as both input (when read) and output (when written). For nontrivial applications, you'll find that the number of gates quickly approaches "a lot".

  • by Trepidity (597) <delirium-slashdot@@@hackish...org> on Wednesday January 08, 2014 @04:08PM (#45901403)

    One caveat to going this route: if the algorithm contains well-known operations as building blocks, you probably don't want to synthesize your own VHDL versions of those standard operations, since they already have highly optimized hardware implementations. For example, if one step of the algorithm is "compute an FFT", you probably want to use an existing FFT IP core [ipcores.com] to implements it, rather than translating some FFT C code to new VHDL.

    At one extreme, where the algorithm is nothing but a chain of such cores (common in DSP applications), you could get a rough estimate just by looking up the gate counts for each operation and adding them up.

  • Re:Verilog (Score:5, Informative)

    by ranulf (182665) on Wednesday January 08, 2014 @04:23PM (#45901541)

    The number of slices or logic cells or whatever else a particular synthesis program for a particular chip generates doesn't exactly correspond to a number of gates either. For instance, a single 4-in 1-out LUT on a Xilinx can be used for 1 gate or 6.

    I wouldn't have much confidence in automatic C to HDL conversion either. Good HDL design is about understanding the problem in terms of gates and parallelism. FPGAs and ASICs in general aren't particularly good at things that CPUs are good for, and inversely CPUs aren't especially good for things that FPGAs and ASICs can do well.

    The OP shows such a lack of understanding of hardware design that it's not funny! "Add = 3 gates, Divide = 6 gates" is quite comical to anyone who actually knows these things. A more ball park is that an n-bit add can be done with 2n LUTs, in terms of gates it's about 5n gates, but really that depends what gates you have available. A multiplier is massively more, dividing is even more complicated still. Fortunately, many FPGAs come with a few dedicator multipliers... Unless your algorithm requires only as many multipliers as you have available, you're probably best building a state machine and multiplexing a single multiplier unit, in much the same way as a CPU multiplexes the ALU at its core.

    The whole thing is massively dependent on algorithm and experience of the person doing the porting. The best advice is to say "I don't know" or to hire someone who does or suggest them running the algorithm on an embedded CPU.

  • by Asmodae (1155077) on Wednesday January 08, 2014 @04:24PM (#45901555)

    There's been several people who suggested using a high-level synthesis tool to convert your software (c/c++) directly to HDL (verilog/VHDL) of some kind. This can work and I've been on this task and seen it's output before. The catch is; unless that software was expressly and purpose written to describe hardware (by someone who understands that hardware and it's limitations and how that particular converter works), it almost always makes awful and extraordinarily inefficient hardware.

    Case in point - we had one algorithm developed in Simulink/Matlab that needed to end up in an FPGA. After 'pushing the button' and letting the tool generate the HDL, it consumed not just 1 but about 4 FPGAs worth of logic gates, RAMs, and registers. Needless to say the hardware platform only had one FPGA and a good portion of it was already dedicated to platform tasks so only about 20% was available for the algorithm. We got it working after basically re-implementing the algorithm with the goal of hardware in mind. The generation tool's output was 20 times worse than what was even feasible. If you're doing an ASIC you can just throw a crap-load of extra silicon at it, but that gets expensive very quickly. Plus closing timing on that will be a nightmare.

    My job recently has been to go through and take algorithms written by very smart people (but oriented to software) and re-implement them so they can fit on reasonably sized FPGAs. It can be a long task sometimes and there's no push-button solution for getting something good, fast, cheap. Techies usually say you can pick two during the design process, but when converting from software to hardware you usually only get one.

    Granted this all varies a lot and depends heavily on the specifics of the algorithm in question. But the most likely way to get a reasonable estimate is going to be to explain the algorithm in detail to an ASIC/FPGA engineer and let them work up a prelim architecture and estimate. The high-level synthesis push-button tools will give you a number but it probably won't be something people actually want to build/sell or buy.

  • by Megane (129182) on Wednesday January 08, 2014 @04:28PM (#45901599) Homepage
    A more accurate car analogy would be GM wanting to build a car using your technology and asking you how many assembly line workers it would take.
  • Re:Verilog (Score:3, Informative)

    by Jane Q. Public (1010737) on Wednesday January 08, 2014 @04:48PM (#45901751)

    "Some C algorithms may never transfer well into a hardware implementation."

    This is a fundamentally silly thing to say.

    Hardware can be made to implement ANY functioning software. It might not be easy, but it is pretty much by definition possible. It's already running on hardware... it would be very rare indeed for it to not be possible to translate it into even more-efficient hardware, since the hardware it's running on now is general-purpose.

  • Re:Verilog (Score:5, Informative)

    by harrkev (623093) <kfmsd.harrelsonfamily@org> on Wednesday January 08, 2014 @04:50PM (#45901761) Homepage

    Seriously???? Asking a C++ programmer to begin to use Verilog is simply not practical. There is a VERY STEEP learning curve in trying to target real hardware. There is even a very different frame of mind that has to be learned in order to target gates.

    I speak from experience. I program Verilog and SystemVerilog for a living doing ASIC design.

    Now, to answer the OP:

    The answer is very strongly: it depends. The most optimistic answer is a couple hundred thousand. Implement an 8-bit CPU and write the thing in under 32K of code.

    On the other end of the spectrum is "many billions." Design your own x86 multi-core CPU, throw a couple of gigs of SRAM on the ASIC, tons of flash for a solid-state disc drive, and you will have a complete high-end PC on a chip. Then add your software.

    Of course, these are both ridiculous extremes. Everything depends on the TYPE of operations being done. In a CPU a simple 32-bit multiply can be done with one character ("*"). In gates, if you need the answer in a single clock cycle, it can take an EXTREME amount of logic. However, if you are willing to wait 32 clock cycles for the answer, the amount of logic is reduced to a very manageable level. This is why C++ is a bad choice of input. How time-sensitive is it? Hardware is also very parallel in nature. Different parts of the chip can indeed be working on different things at the same time. You can go for a strictly pipelined architecture where each block does one little bit of the job and passes it off to the next block. High throughput, but lots of gates. Or you could design a general-purpose block and have it to everything slowly (the most extreme example of this approach is a common CPU).

    While I have heard of magic "C to gates" compilers, after almost 15 years in the business, I have never actually seen one. The closest that I have seen are tools that can turn Matlab code into (messy-looking) gates. If your algorithm is DSP in nature, this is a very viable alternative. Otherwise, the only advice that I can give you is to consult somebody who does hardware design for a living (like me).

    Otherwise, you really need to look at where the input comes from, where the output goes, and how fast you need to do the work.

  • Re:Verilog (Score:4, Informative)

    by SecurityTheatre (2427858) on Wednesday January 08, 2014 @05:01PM (#45901861)

    "Add = 3 gates, Divide = 6 gates" is quite comical to anyone who actually knows these things.

    Looking at an old reference I have, a 16-bit ripple-carry style adder requires 576 transistors, and a 16-bit carry-lookahead style adder (faster) requires 784 transistors.

    This is not including ANY control circuitry, nor a subtract feature.

    A pure-hardware 16-bit integer DIVIDE is between 15-30 times more complicated. To do it in pure hardware, would require on the order of 23,000 transistors.

    Unless you need your division to happen wicked fast with low latency and you don't care about transistor count, it's better to build add/shift hardware and simply perform a division operation using those bits of hardware repeatedly.

    Also, we're only doing 16-bit. If you need 64-bit, multiple all of those numbers by about 50 (spitballing).

    And converting from C into VHDL is probably not going to be the best way to go about this. Hire a decent hardware engineer.

  • Re: Verilog (Score:4, Informative)

    by harrkev (623093) <kfmsd.harrelsonfamily@org> on Wednesday January 08, 2014 @05:49PM (#45902229) Homepage

    Gaaa. On the blocking vs. non-blocking, Slashdot swallowed the "less than" sign. Apologies.

  • Re:Holy crap (Score:5, Informative)

    by Austerity Empowers (669817) on Wednesday January 08, 2014 @11:10PM (#45904257)

    To give a more helpful, unhelpful answer, it's an ill-formed question. "How many gates" depends on the target on which you synthesize the hardware: a PCB, an FPGA, actual silicon (which fab? Which process? whose std cell library? what clock frequency?).

    If somehow the above could be narrowed down by asking the customer, then the next thing I'd advise is contracting someone who can write RTL using an HDL (verilog is most popular). The synthesizeable subset of HDL is tricky to learn for non-HW people, so unless you understand digital logic well I'd suggest finding someone else to do it for you. They can then synthesize it to the targeted device/platform. If you can do this, you should charge quite a lot of money since this form of IP is expensive, and they know it. If they're ok with that, you may also want to have this contractor also write the design verification suite, since this company will certainly want that to integrate into their own testing. Lots of contractors are out there for this due to the cyclic nature of this job, make sure you also have some support feature in place if you need them to fix/update the code later.

    Even simple software algorithms can be very big in HW, but some surpisingly complex SW algorithms are next to 1 liners in HW (like any form of bit masking or bit swizzling is free!). But generally if there are a lot of sequential steps, and those steps are different...it gets big. Also assume that for every 1 SW guy that wrote the code, you will need 1 RTL designer. If you take the verification step, it may be 1-2 verification engineers for 1 RTL, depending on your timeline.

  • Re:Holy crap (Score:4, Informative)

    by Goaway (82658) on Thursday January 09, 2014 @06:26AM (#45905395) Homepage

    I work with plenty of people with that kind of degree or higher, and I doubt any of them could. Very few CS educations would teach you that. That is highly specialist knowledge, in an usual field.

    I really don't know why you would ever think that would be a common skill.

For every bloke who makes his mark, there's half a dozen waiting to rub it out. -- Andy Capp

Working...