Ask Slashdot: What Is the Most Painless Intro To GPU Programming?

Slashdot is powered by your submissions, so send in your scoop

Ask Slashdot: What Is the Most Painless Intro To GPU Programming? 198

Posted by Soulskill on Friday July 19, 2013 @04:31PM from the large-reference-books-and-opiates dept.

dryriver writes "I am an intermediate-level programmer who works mostly in C# NET. I have a couple of image/video processing algorithms that are highly parallelizable — running them on a GPU instead of a CPU should result in a considerable speedup (anywhere from 10x times to perhaps 30x or 40x times speedup, depending on the quality of the implementation). Now here is my question: What, currently, is the most painless way to start playing with GPU programming? Do I have to learn CUDA/OpenCL — which seems a daunting task to me — or is there a simpler way? Perhaps a Visual Programming Language or 'VPL' that lets you connect boxes/nodes and access the GPU very simply? I should mention that I am on Windows, and that the GPU computing prototypes I want to build should be able to run on Windows. Surely there must a be a 'relatively painless' way out there, with which one can begin to learn how to harness the GPU?"

This discussion has been archived. No new comments can be posted.

Ask Slashdot: What Is the Most Painless Intro To GPU Programming?

Load All Comments

Search 198 Comments Log In/Create an Account

Comments Filter:

Check out MC# (Score:1)

by Anonymous Coward writes:

I tried it out once a while ago just to see what it does. It looks 'dead' from a support POV, but it is still out there;
Release notes for MC# 3.0:
a) GPU support both for Windows and Linux,
b) integration with Microsoft Visual Studio 2010,
c) bunch of sample programs for running on GPU (including multi-GPU versions),
d) "GPU programming with MC#" tutorial.
GPU programming is pain (Score:5, Funny)

by Anonymous Coward writes: on Friday July 19, 2013 @04:35PM (#44331929)

GPU programming is painful. A painless introduction doesn't capture the flavor of it.

Share
twitter facebook
- Re:GPU programming is pain (Score:5, Funny)
  
  by PolygamousRanchKid ( 1290638 ) writes: on Friday July 19, 2013 @05:07PM (#44332317)
  
  Yeah, it would be like S&M without the pain . . . cute, but something essential is missing from the experience.
  Heidi Klum has a TV show call "Germany's Next Top Model". She basically gets all "Ilsa, She-Wolf of the SS" on a bunch of neurotic, anorexic, pubescent girls, teaching them how a top model needs to suffer.
  Heidi Klum would make a good GPU programming instructor.
  . . . and even non-geeks would watch the show. A win-win for everyone.
  
  Parent Share
  twitter facebook
  - Re:GPU programming is pain (Score:4, Funny)
    
    by Anonymous Coward writes: on Friday July 19, 2013 @05:45PM (#44332767)
    
    Yeah, that's what we need! More neurotic, anorexic, pubescent girls who know how to do GPU programming!
    
    Parent Share
    twitter facebook
    - Re:GPU programming is pain (Score:5, Funny)
      
      by Darinbob ( 1142669 ) writes: on Friday July 19, 2013 @08:15PM (#44333957)
      
      I thought we needed more "Ilsa, She-Wolf" programming instructors.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by tigersha ( 151319 ) writes:
        
        No, we need more instructors who look like Heidi Klum!
- - Re: (Score:2)
    
    by Ken_g6 ( 775014 ) writes:
    
    Only if the language you're using is pain. In other words: If you're trying to use C/C++/C#/Java/Pascal/â¦ to write highly parallel code... YOU'RE DOING IT WRONG.
    Those languages are not made for that. Don't try to shoehorn parallel programming onto them.
    This is a far more elegant task in functional languages like Haskell, which are from ground up designed for parallel processing.
    But GPU programming isn't just about parallel programming. It's also about low register availability, high memory latency, complicated memory access patterns, and just-plain-strange inter-process communication. The GPU has many more parts than a CPU, and you need to learn to use most or all of them effectively.
  - Re: (Score:2)
    
    by mc6809e ( 214243 ) writes:
    
    In other words: If you're trying to use C/C++/C#/Java/Pascal/ÃÂ¦ to write highly parallel code... YOU'RE DOING IT WRONG.
    You don't use those languages to write highly parallel code. You use those languages to sequentially control a GPU to get it to execute programs in parallel.
    Big difference (really).
Learn OpenCL (Score:5, Insightful)

by Tough Love ( 215404 ) writes: on Friday July 19, 2013 @04:37PM (#44331939)

Since the whole point of GPU programming is efficiency, don't even think about VBing it. Or Pythoning it. Or whatever layer of a shiny crap might seem superficially appealing to you.
Learn OpenCL and do the job properly.

Share
twitter facebook
- Re:Learn OpenCL (Score:5, Interesting)
  
  by Tr3vin ( 1220548 ) writes: on Friday July 19, 2013 @04:41PM (#44331987)
  
  Learn OpenCL and do the job properly.
  This. OpenCL is C based so it shouldn't be that hard to pick up. The efficient algorithms will be basically the same no matter what language or bindings you use.
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by Midnight Thunder ( 17205 ) writes:
    
    Learn OpenCL and do the job properly.
    This. OpenCL is C based so it shouldn't be that hard to pick up. The efficient algorithms will be basically the same no matter what language or bindings you use.
    Well, the first thing is to understand parallel programming and what sort of things work well in a GPU. With that basic understanding, then OpenCL becomes a tool for doing that work. Starting with an OpenCL based "hello world" type application would then be the next step.
    - - Re: (Score:2)
        
        by Midnight Thunder ( 17205 ) writes:
        
        I can remember a demonstration that showed that below a certain amount of "work" it was better to use the CPU and then above that to use the GPU. I can't remember the details, but from what I remember it was because of the overhead of setting up the cores?
        I agree with the OpenCL route, simply because of previous history of 3DFX vs OpenGL. Short term CUDA has the advantage.
        
        Re: (Score:2)
        
        by SuperTechnoNerd ( 964528 ) writes:
        
        It's also the kind of work. Some things, like fluid dynamics and discreet element analysis lend themselves to very parallel computation.
        Playing solitaire not so much..
        Use the right tool for the right job.
  - Re: (Score:2)
    
    by gregor-e ( 136142 ) writes:
    
    If you're learning this for your job, maybe you can persuade your boss to pay for an OpenCL course like this one [acceleware.com].
  - - Re:Learn OpenCL (Score:4, Funny)
      
      by Chaos Incarnate ( 772793 ) writes: on Friday July 19, 2013 @10:23PM (#44334571) Homepage
      
      Just because we C# programmers can't do memory management worth a damn doesn't mean we're no better than VB programmers. We at least know what case sensitivity means. ;)
      
      Parent Share
      twitter facebook
    - Re: Learn OpenCL (Score:3, Informative)
      
      by guruevi ( 827432 ) writes:
      
      I have written code for computational biology - CUDA is a lot easier to pick up if you're just converting from C. They have great examples and documentation, great plugins but you're stuck on a single hardware platform. OpenCL on the other hand is a lot less 'nice' to begin with (pouring over 250 page PDFs with minimal explanation) but allows you to leverage both CPU and GPU efficiently and a lot less hardware independent although these days it's just nVidia for serious GPU computing and maybe Intel is star
      - Re: (Score:2)
        
        by jkflying ( 2190798 ) writes:
        
        Troll much? All the bitcoin miners use AMD OpenCL for their GPGPU properties. Why? Because they outperform NVidia. You are making sweeping statements with nothing to back you up.
        
        Re: (Score:2)
        
        by mathimus1863 ( 1120437 ) writes:
        
        That's not because AMD is better. It's because AMD/ATI has some obscure instructions heavily optimized that happen to be the same instructions useful for Bitcoin mining. And they also tend to focus on a plethora of less-powerful cores, while NVIDIA focuses on using fewer-but-more-powerful cores. Again, that benefits SHA256 operations, but doesn't necessarily benefit other applications.
        
        In a way, AMD got lucky with this one. It's the same reason the AMD CPUs outperform Intel on hashing -- because their
        
        Re: (Score:2)
        
        by jkflying ( 2190798 ) writes:
        
        The GP was claiming that AMD stuff sucks in every situation. I was just pointing out that they were wrong.
        And hashing is really important, particularly for 'new-age' languages like Python and JavaScript.
        
        Re: (Score:2)
        
        by guruevi ( 827432 ) writes:
        
        Experience, I was talking about serious computations, not something you use gaming GPU's for and not a single optimized task either. You could probably take any random task and optimize the shit out of it on either platform, doesn't mean said platform is good for a "general purpose" (the GP in GPGPU).
        Also, even for Bitcoin mining, a lot of rigs require Windows, not Linux, a lot of cards in the Bitcoin guides have disclaimers such as "don't use x with this AMD card" where x is some type of OpenCL instruction
        
        Re: (Score:2)
        
        by slashdot_commentator ( 444053 ) writes:
        
        No, AMD hardware outperforms Nvidia hardware, computational units/$1, for bitcoin mining. That's why bitcoin miners use OpenCL; its about the card. (When one cares about protein folding, programmers that need speed, are going to CUDA and nvidia cards.) Pot, meet Kettle.
- Re: (Score:3)
  
  by Required Snark ( 1702878 ) writes:
  
  Yep. Some things are intrinsically hard. GPU programming is SIMD programming [wikipedia.org], so you have to work with data parallelism. It helps a lot if you understand how the hardware works. This is where assembly language experience can be a big plus.
  There's no substitute for detailed knowledge. Outside of instruction level parallelism, there is no "magic bullet" for parallel programming. Your have to learn things.
  - - Re:Learn OpenCL (Score:5, Insightful)
      
      by HaZardman27 ( 1521119 ) writes: on Friday July 19, 2013 @05:13PM (#44332393)
      
      That's because the closest analogy to a software engineer using a more abstracted language in the hardware world is the packaging of common circuitry. Or when hardware engineers design chips, do they actually model out the components of every single transistor?
      
      Parent Share
      twitter facebook
      - Re:Learn OpenCL (Score:5, Informative)
        
        by AdamHaun ( 43173 ) writes: on Friday July 19, 2013 @06:23PM (#44333133) Journal
        
        Or when hardware engineers design chips, do they actually model out the components of every single transistor?
        Chip design is absurdly complicated (even on the digital side), and involves several layers of abstraction. In roughly increasing level of detail:
        * Spec level: high-level behavioral description of the functionality of a digital system, something like "8-bit 115.2kbps UART" or "2MHz PWM with 0-100% duty cycle in 0.1% increments".
        * HDL/RTL level: software-like description of the complete system design. Can range from higher-level (describing behavior) to lower-level (describing specific logic). When people talk about buying, selling, or creating "IP" in the chip design world, they're usually talking about RTL for a single functional unit.
        * Gate level: Logic gates and flip-flops and their connections.
        * Transistor level: The transistors that make up the gates, and their connections.
        * Device level: The behavior of an individual transistor.
        * Physical layout: Just what it sounds like; the actual arrangements of metal and silicon.
        There are some more in between, but that should give you an idea. HDLs are not necessarily low-level. For large designs (like modern SoCs), it takes some *very* expensive and complex software to go deeper into the list, and the process is not entirely automated. So I wouldn't say hardware design can't be high-level. The difference is that in hardware, you always have to care about the lowest level when you're doing your high-level design, while in software you can take more things for granted. So even though a board-level design might just be a bunch of off-the-shelf chips hooked together, it still takes a lot of work to make sure everything comes out right.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3)
        
        by Darinbob ( 1142669 ) writes:
        
        In software when you take the low level for granted you end up with a typical bloated Windows application. Of course people get away with it because you just mock people who don't have enough RAM or CPU power until they upgrade in shame.
    - Re: (Score:2)
      
      by viperidaenz ( 2515578 ) writes:
      
      If you had something higher level than VHDL/verilog, then instead of your compile for your ASIC taking 6 hours, it'll take 6 days
- Re: (Score:1)
  
  by sl4shd0rk ( 755837 ) writes:
  
  don't even think about VBing it. Or Pythoning it.
  
  Awwwwww yisssssss... mothoafokin Assembly!
  - - Re: (Score:1)
      
      by Anonymous Coward writes:
      
      Real programmers use patch cables. [columbia.edu]
      - Re: (Score:2)
        
        by mwvdlee ( 775178 ) writes:
        
        Patch cables? Are those the playfully colored, safety-blanket covered, plug-and-play things you kids use these days? Real programmers use a soldering iron and bare metal only.
      - Re: (Score:2)
        
        by dbIII ( 701233 ) writes:
        
        There was still some stuff you could program with patch cables in use in 1990, back before "neural networks" replaced a lot of the functions of analog computers.
- Re:Learn OpenCL (Score:5, Informative)
  
  by CadentOrange ( 2429626 ) writes: on Friday July 19, 2013 @05:03PM (#44332277)
  
  What's wrong with a higher level language that interfaces with OpenCL? You're still writing OpenCL, you're just using Python for loading/storing datasets and initialisation. If you're starting out, something like PyOpenCL might be better as it'll allow you to focus on writing stuff in OpenCL.
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Informative)
    
    by Anonymous Coward writes:
    
    The thing that is hard about gpu programming isn't getting code that works, its getting code that is fast. One of the most significant issues is how the data is arranged and accessed on the GPU. A big portion of this is going to be related to how the data is setup/transfered/accessed over PCIe from/to main memory.
    Basically, your going to want to access that data in a manner that is fairly low level on the cpu side as well. So, the advantages of phython/etc are nullified when you have some binary blob like f
- Re: (Score:2)
  
  by mcmonkey ( 96054 ) writes:
  
  Since the whole point of GPU programming is efficiency, don't even think about VBing it. Or Pythoning it. Or whatever layer of a shiny crap might seem superficially appealing to you.
  Learn OpenCL and do the job properly.
  "VBing?" "Pythoning?"
  Learn English and answer the question properly.
- - Re: (Score:1)
    
    by Anonymous Coward writes:
    
    Considering that GPU programming is intrinsically parallel in nature and pretty much none of those "easier" means really have the concept in question in their worldview, I call BULLSHIT on your line of reasoning.
  - Re: (Score:2, Informative)
    
    by Anonymous Coward writes:
    
    If you can get the job done quicker in something along the lines of VB or Python and the speed up compared to using the CPU alone is good enough, I don't see why you shouldn't do it the easy way. Sure, if you're going to be doing this kind of coding a lot then you should invest time in learning the "best" way to do it, but if its something you'll seldom be doing then it may be more efficient for you just to take the easy option.
    Ordinarily I'd agree with you (programmer's time is worth more than anyone else'
- - Re: (Score:2)
    
    by burisch_research ( 1095299 ) writes:
    
    While you are completely correct in what you say, you are not answering the question. The question is, how do I get in to massively parallel GPU processing easily -- not, 'is my application suited to this?'. It's assumed that the OP might be an idiot, however he is asking a completely valid question, and you should be responding accordingly.
    Actually, this post has prompted me to re-evaluate my methodologies. I have an enormous image-processing project to complete in short order. I was previously intending t
CUDA (Score:1)

by Anonymous Coward writes:

CUDA is extremely easy to learn and use (if you know C and of course have an NVidia card) and is well worth the effort for some projects. Alternatively you could try skipping GPU programming and just using OpenMP which would still greatly improve performance if your not already multithreading.
- Re: (Score:2, Insightful)
  
  by Anonymous Coward writes:
  
  Never under any circumstances use cuda. We don't need anymore proprietary garbage floating around. Use opencl only.
  - Re: (Score:2)
    
    by slashdot_commentator ( 444053 ) writes:
    
    If you don't care how long your programs take to solve a problem, avoid coding in cuda. If you want to keep your job, and your employer needs to run the app on nvidia cards as fast as possible, you're writing it in cuda.
- Re:CUDA (Score:5, Informative)
  
  by UnknownSoldier ( 67820 ) writes: on Friday July 19, 2013 @05:32PM (#44332635)
  
  Agreed 100% about CUDA and OpenMP! Already invented a new multi-core string searching algorithm and having a load of fun playing around with my GTX Titan combing CUDA + OpenMP. You can even do printf() from the GPU. :-)
  The most _painless_ way to learn CUDA is to install CUDA on a Linux (Ubuntu) box or Windows box.
  https://developer.nvidia.com/cuda-downloads [nvidia.com]
  On Linux, at the command line fire up 'nsight' open the CUDA SDK samples and start exploring! And by exploring I mean single-stepping through the code. The NSight IDE is pretty darn good considering it is free.
  Another really good doc is the CUDA C Programming Guide.
  http://docs.nvidia.com/cuda/cuda-c-programming-guide/ [nvidia.com]
  Oh and don't pay attention to the Intel Propaganda - there are numerous inaccuracies:
  Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU
  http://pcl.intel-research.net/publications/isca319-lee.pdf [intel-research.net]
  
  Parent Share
  twitter facebook
OpenACC (Score:1)

by Anonymous Coward writes:

don't know what the status is on Windows, but for high-performance computing, OpenACC is an emerging standard, with support by Cray and PGI compilers.
It's easier than it sounds (Score:1)

by Anonymous Coward writes:

The heavy lifting has mostly already been done for you. [nvidia.com] There are CUDA wrappers out there that, with a few changes to your code, run it as close to optimally as possible using the card's cores. We had a Nvidia guy come by and give a talk just to show off how relatively painless it is (similar to OpenMPI, in my opinion). If you've got a couple extra people around consider reaching out to Nvidia to have someone show everyone a few of the options.
Obsidian (Score:5, Informative)

by jbolden ( 176878 ) writes: on Friday July 19, 2013 @04:51PM (#44332103) Homepage

I get the impression that CUDA/OpenCL is still the best option. This thesis on Obsidian [chalmers.se] presents, a Haskell set of binding which might be easier and also covers the basics quite well. Haskell lends itself really well because the language inherently is designed for parallelism because of purity and out of order computation. That being said, I think Obsidian is a bit rough around the edges but if you are looking for a real alternative, this is one.

Share
twitter facebook
- Re: (Score:2)
  
  by CoderBob ( 858156 ) writes:
  
  I've seen a few people mention Haskell, but no love for Erlang in here. Any particular reason?
  - Re:Obsidian (Score:5, Informative)
    
    by jbolden ( 176878 ) writes: on Friday July 19, 2013 @09:55PM (#44334431) Homepage
    
    The big issue is that Haskell is lazy. Which means in particular the programmer by default doesn't determine order of execution. This makes Haskell a better counter example since order of execution is so key to so many languages.
    Erlang's type system is rather typical dynamic while Haskell has a Hindley–Milner type system which again shows off the plusses of functional better.
    Haskell has more of the most sophisticated ideas in computer science than any other language. It has become the standard for computer science in particular language and compiler research. So when an idea is "news" there is very likely an implementation of Haskell of that idea. Erlang's community is more practical and less cutting edge.
    Haskell is easier to program in.
    
    Parent Share
    twitter facebook
Jitter (Score:1)

by handshake, doctor ( 1893802 ) writes:

Check out Max/MSP/Jitter [cycling74.com].

As you describe, the interface is VPL - connecting boxes / nodes to access the GPU is one of the (many) things the program is capable of. Depending on what you're trying to, you may also find Gen [cycling74.com] useful for generating GLSL shaders within the Max environment (although you can use other shaders as well).

I'm currently neck-deep in a few Jitter projects using custom shaders, etc., and while it's great for rapid prototyping, getting good frame-rates and production stable code out i
GPU programming *is* pain, princess. (Score:5, Informative)

by Chris Mattern ( 191822 ) writes: on Friday July 19, 2013 @04:52PM (#44332133)

Anyone who tells you differently is selling you something.

Share
twitter facebook
Udacity teaches CUDA (Score:3)

by Arakageeta ( 671142 ) writes: on Friday July 19, 2013 @04:53PM (#44332151)

Check out the Udacity class on parallel programming. It's mostly CUDA (I believe it's taught by NVIDIA engineers): https://www.udacity.com/course/cs344 [udacity.com]
CUDA is generally easier to program than OpenCL. Of course, CUDA only runs on NVIDIA GPUs though.

Share
twitter facebook
OpenACC (Score:5, Interesting)

by SoftwareArtist ( 1472499 ) writes: on Friday July 19, 2013 @04:53PM (#44332161)

OpenACC [openacc-standard.org] is what you're looking for. It uses a directive based programming model similar to OpenMP, so you write ordinary looking code, then annotate it in ways that tell the compiler how to transform it into GPU code.
You won't get as good performance as well written CUDA or OpenCL code, but it's much easier to learn. And once you get comfortable with it, you may find it easier to make the step from there into lower level programming.

Share
twitter facebook
- Re: (Score:3)
  
  by 140Mandak262Jamuna ( 970587 ) writes:
  
  It works in theory. In practice, unless you understand your code well, and the way compiler built the instructions well, and understood what these directives very well, you wont get any speed improvements. There are times when the over heads slow down the code and the simple minded implementation had brain dead locks, and you end up with slower code.
  We have come a long way since the days of assembly and assembly in another name Fortran. But the overheads of the higher level languages have been masked a lo
  - Re: (Score:3)
    
    by SoftwareArtist ( 1472499 ) writes:
    
    True, and this is even more true on GPUs than CPUs. They do a lot less to shield you from the low level details of how your code gets executed, so those details end up having a bigger impact on your performance. And to make it worse, those details change with every new hardware generation!
    But for a new user just getting into GPU programming, it's easier to learn those things in the context of a simple programming model like OpenACC than a complicated one like CUDA or OpenCL. That just forces them to deal
Very Similar Story (Score:2)

by Chaseshaw ( 1486811 ) writes:

VB.NET background. Wanted to get into GPGPU to accelerate some of my more complicated math calculations. Tried CLOO (open source .net GPU wrappers) and couldn't get it to work, tried AMD's OPENCL dev gui, couldn't get it to work. Eventually found the answer in python. GPGPU in pyopencl is well-documented thanks to the bitcoiners, and from .net you can either run the python in a shell, or write a little python kernel to listen for, and process commands. Only catch is the opencl abilities are limited, and you
Proper approach to GPU programming (Score:2, Insightful)

by godrik ( 1287354 ) writes:

Like in all attemps at getting stuff faster, you should first wonder what kind of performance you are already getting out of CPU implementation. Provided you seem to believe it is actually possible to get performance out of a VB like langage, I assume that your base implementation heavily sucks.
Putting stuff on a GPU has for only goal to make things faster but it is mostly difficult to write and non portable. Having a good CPU implementation might just be what you need. It also might be easier for you to wr
- - Re: (Score:2)
    
    by godrik ( 1287354 ) writes:
    
    Frankly, I don't know anything about C#. But I know quite a bit of High Performance Computing. What I can guarantee you, is that I never saw an high performance routine written efficiently in anything else than in C or in C++. Sure people claim to do great in java. But that's only a claim, I still have to see seriously complicated implementations in Java.
    Regarding C#, honestly, nobody writes C# for HPC in the academia. I never met a single one. But I expect it to be similar. The main problem you get is that
C++ AMP (Score:2)

by VertigoAce ( 257771 ) writes:

Take a look at C++ AMP [microsoft.com]. It is a small language extension that lets you target the GPU using C++. The platform takes care of most of the mechanics of running code on the GPU. Also check out this blog post [msdn.com] for links to tutorials and samples.
Coursera (Score:3)

by elashish14 ( 1302231 ) writes: <profcalc4 AT gmail DOT com> on Friday July 19, 2013 @05:05PM (#44332293)

Coursera has some courses on GPU programming, like this one [coursera.org], and what's nice about them pretty slow, and I'm assuming that they explain things well. Other online courses probably offer the same, and I think the video lectures would be helpful in understanding the concepts.

Share
twitter facebook
- Re: (Score:2)
  
  by jasax ( 1728312 ) writes:
  
  I took that course: https://www.coursera.org/course/hetero [coursera.org]
  
  I also took a course from Udacity: https://www.udacity.com/course/cs344 [udacity.com] but this one I didn't finish, I've done perhaps 30% of it (I already had finished Coursera's). One of these days I'll go there to close matters :-)
  
  The courses in Udacity are "always online", so anyone can register anytime and finish the course with his/hers own pace. Quizzes, exams and grading with certificate included have no fixed limits. On the other hand, the courses f
  - Re: (Score:2)
    
    by burisch_research ( 1095299 ) writes:
    
    Having researched both, OpenCL is definitively better by far. Granted, CUDA has the native advantage, but that's not always going to be there, and I think most would agree that vendor tie-in is a Very Bad Thing (tm)
OpenCV (Score:2)

by SpinyNorman ( 33776 ) writes:

Try Intel's free OpenCV (Computer Vision) library, which includes GPU acceleration.
Nothing easy but Udacity can help (Score:5, Informative)

by Jthon ( 595383 ) writes: on Friday July 19, 2013 @05:12PM (#44332381)

So there's nothing really easy about GPU programming. You can look at C++ AMP from Microsoft, OpenMP or one of the other abstractions but you really need to understand how these massively parallel machines work. It's possible to write some perfectly valid code in any of these environments which will run SLOWER than on the CPU because you didn't understand fundamentally how GPUs excel at processing.
Udacity currently has a fairly decent intro course on GPU programming at: https://www.udacity.com/course/cs344 [udacity.com]
It's based around NVIDIA and CUDA but most of the concepts in the course can be applied to OpenCL or another GPU programming API with a little syntax translation. Also you can do everything for the course in your web-browser and you don't need an NVIDIA GPU to finish the course exercises.
I'd suggest running through that and then deciding on what API you want to end up using.

Share
twitter facebook
Understand The Hardware (Score:3, Informative)

by Anonymous Coward writes: on Friday July 19, 2013 @05:28PM (#44332585)

If you are going to program a GPU, and you are looking for performance gains, you MUST understand the hardware. In particular, you must understand the complicated memory architecture, you must understand the mechanisms for moving data from one memory system to another, and you must understand how your application and algorithm can be transformed into that model.
There is no shortcut. There is no magic. There is only hardware.
If you do not believe me, you can hunt up the various Nvidia papers walking you through (in painful detail-- link below) the process of writing a simple matrix transpose operation for CUDA. The difference between a naive and a good implementation, as shown in that paper, is huge.
That said, once you understand the principles, CUDA is relatively easy to learn as an extension of C, and the Nvidia profiler, NVVP, is good at identifying some of the pitfalls for you so that you can fix them.
http://www.cs.colostate.edu/~cs675/MatrixTranspose.pdf

Share
twitter facebook
OpenACC or OpemMP 4.0 are exactly what you want (Score:5, Informative)

by John_The_Geek ( 1159117 ) writes: on Friday July 19, 2013 @05:33PM (#44332637)

I teach this stuff daily, and the huge advance over the past year has been the availability of OpenACC, and now OpenMP 4, compilers that allow you to use directives and offload much of the CUDA pain to the compiler.
There is now a substantial base of successful codes that demonstrate that this really works efficiently (both development time and FLOPS). S3D runs at 15 PFLOPS on Titan using this and may well win the Gordon Bell prize this year. Less than 1% of lines of code modified there. NVIDIA has a whole web site devoted to use cases.
I recommend you spend a day to learn it. There are regular online courses offered, and there is a morning session on it this Monday at XSEDE 13 if you are one of those HPC guys. A decent amount is available online as well.
BTW, with AMD moving to Fusion, the last real supporter of OpenCL is gone. NVIDIA prefers OpenACC or CUDA and Intel prefers OpenMP 4 for MIC/Phi. So everyone officially supports it, but no one really puts any resources into it and you need that with how fast this hardware evolves.

Share
twitter facebook
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
- Re: (Score:2)
  
  by burisch_research ( 1095299 ) writes:
  
  John,
  While I have to defer to your position as being a teacher of these things, I have to question what you say.
  a) OpenCL was intended as an open access API to GPGPU techniques. Has something changed to channel people into vendor-specific approaches?
  b) What advantages do OpenACC and OpenMP 4 offer over previous techniques? Are these standards-based?
  c) Which GPGPU language (if any) can one target in the sure knowledge that it is future-proof? In which ways is this superior to OpenCL?
  These are genuine questio
  - - Re: (Score:2)
      
      by burisch_research ( 1095299 ) writes:
      
      Thanks for your response GTG, I appreciate your answers.
      Let me give you my perspective: I am a very experienced dev, quite happy in ASM, C, C++, and C#, and other languages (but I hate Java with a passion!)
      I currently have a problem set which is extremely amenable to being solved with GPU development. I have had a look this evening at OpenCL, and I must say I was very impressed with how simple it appeared to me, coming from a C background. I was just about to plunge into the dev on OpenCL, when you [very in
      - Re: (Score:2)
        
        by burisch_research ( 1095299 ) writes:
        
        JTG* sigh ... I'm so inattentive ...
Learn to Program an Intel Phi instead (Score:1)

by quarkie68 ( 1018634 ) writes:

The only painful thing you have to do is to decide how to increase threading in your code.
- Re: (Score:2)
  
  by TechyImmigrant ( 175943 ) writes:
  
  Yes. This.
  60 independent cores with general purpose instruction set on the same die with fast interconnect. If you need to pack some parallel speed on and do real work, using a GPU is pissing in the wind. An Intel Phi lets you get the job done.
  GPUs do certain things very well, but the odds of your problem mapping well to GPUs is slight.
  - - Re: (Score:2)
      
      by TechyImmigrant ( 175943 ) writes:
      
      Getting stuff to work is one rather important aspect of getting stuff done. Also those Phi threads are big honking general purpose threads with lots of cache and ALU resources, not a highly strung state machine hanging of a matrix multiplier.
      There is a small subset of problems that map to parallel threads of SIMD operations. Try optimizing IC layout on a GPU, or evaluating the biases in a crypto function by running the probabilities backwards through the gates. Those are not a problems for GPUs, but they ar
- Re: (Score:2)
  
  by godrik ( 1287354 ) writes:
  
  I was one of the first phi user outside of intel (not in thefirst batch, but in the second one). And programming Phi can be quite painful as well. People always try to make you believe that perfoance is easy. But frankly, it is not. You need to understand how the architecture works and many people are not trained like that nowadays. Throwing a GPU or a Phi will only bring more problem.
  From what the OP says, it is not even clear he used all the processing power available on his CPU. And since he ties to get
Do you need the GPU? (Score:3)

by jones_supa ( 887896 ) writes: on Friday July 19, 2013 @05:39PM (#44332693)

You would probably see a multi-fold increase in performance by simply converting your project from C# to C++.

Share
twitter facebook
- Re: (Score:2)
  
  by greg1104 ( 461138 ) writes:
  
  Possibly [debian.org], but there are a lot of tasks that only see about a doubling of speed. A C++ port is only likely to speed things up, while a GPU one is certain to. (Presuming the assumption about parallel execution is correct)
  - Re: (Score:2)
    
    by godrik ( 1287354 ) writes:
    
    That's a buggy claim. There is nothing in GPUs that ensures you will get performance. Many algorithms are very difficult to write in GPUs. You have (essentially) no cache which make none trivial memory access slow. You have thread divergence issues which can kill your performance even if it contains significant parallelism. There is no interwarp synchronisation which is quite painful for fine synchronisation.
    Clearly the picture is more complicated than "parallel execution" => performance on GPU. If you h
GPU Maven Plugin (Score:1)

by Anonymous Coward writes:

Closest to painless I know of is https://bitbucket.org/bradjcox/gpu-maven-plugin
The GPU Maven Plugin compiles Java code with hand-selected Java kernels to CUDA that can run on NVIDIA GPUs of compatibility level 2.0 or higher. It encapsulates the build process so that GPU code is as easy to build with maven as ordinary Java code. The plugin relies on the NVidia CUDA SDK being installed which must be done separately.
OpenSceneGraph or OGRE (Score:2, Interesting)

by bzipitidoo ( 647217 ) writes:

I went with OpenSceneGraph.
Long ago, I tried xlib only, because at that time Motif was the only higher layer available, and it was proprietary. It was horrible. xlib has been superceded by XCB, but I wouldn't use that, not with all the other options out there today. XCB is a very low level graphics library, for drawing lines and letters in 2D. 3D graphics can be done with that, but your code would have to have all the math to transform 3D representations in your data into 2D window coordinates for XCB
- Re: (Score:2)
  
  by bzipitidoo ( 647217 ) writes:
  
  Gah, should have read the summary more carefully. I was talking about 3D graphics, not general programming on the GPU.
  - Re: (Score:2)
    
    by burisch_research ( 1095299 ) writes:
    
    Whew, I was just about to launch into a tirade on how wrong you were! As it stands, I'm going to be a lot less tired than I'd thought I'd be!
Mary Hall at The University of Utah (Score:2)

by TwineLogic ( 1679802 ) writes:

I wouldn't call her advanced coursework easy, but a resource that belongs on this thread: http://www.cs.utah.edu/~mhall/cs6963s09/ [utah.edu]

Mary Hall is a professor of Computer Science. Her recent work is related to compilers and parallel programming on GPUs. Her professional web page is something like an on-line open course, or the framework of one.
Take it from someone who's done a lot of CUDA (Score:2)

by mathimus1863 ( 1120437 ) writes:

There isn't really a painless way. Like a lot of skills in life, the only way to learn is through pain, suffering and frustration. But it makes the prize all the much more enjoyable. You need to be experienced at regular, serial programming in C/C++, then mangle all of it to figure out how to program in parallel. I literally read the CUDA programming's guide 5 times. And I felt like I gained as much on the fifth time as I did the first time. And don't expect your debugger to save you -- if it's like i
Ask a neck-beard... (Score:2)

by OhSoLaMeow ( 2536022 ) writes:

... to code it in COBOL for you.
C or C++ with vectors (Score:2)

by gnasher719 ( 869701 ) writes:

OpenCL or CUDA is a real pain, and a lot to learn. But any modern Intel quad core processor can deliver 50 billion floating point operations per second if you treat it right.

Use C or C++ with the Clang compiler (gcc will do fine as well probably) and vector extensions. Newer Intel processors have 256 bit vector registers, so you can define vector types with 32 8-bit integers, 16 16-bit integers, 8 32-bit integers or 8 single precision floating point numbers, or 4 double precision floating point numbers.
Write some graphics shaders and multithreaded prog (Score:2)

by fatgraham ( 307614 ) writes:

I've just started with opencl and love it, it's fast, easy, debuggable (codel) and -with stable drivers- not too much of a pain when it goes wrong.
I've been writing hlsl, glsl and arb vertex shaders for years and to me, opencl kernels are basically the same thing (language and limitation wise). Convert some full screen graphics effects to opencl for a first example, then make it do other stuff (maybe with buffers instead of images).
Once you're used to making/debugging kernels, start splitting code/algorithm
NPP (Score:2)

by dsouth ( 241949 ) writes:

The easiest on-ramp to speeding up image/video processing is probably the npp library https://developer.nvidia.com/npp [nvidia.com] [nvidia.com] It has functionality and syntax similar to Intel's ipp library but uses an NVIDIA cuda-capable GPU to accelerate the operations.
If you want to dig in deeper you could explore OpenACC http://www.openacc-standard.org/ [openacc-standard.org] [openacc-standard.org] OpenACC is a directives based approach to accelerator programming. You comment or mark up your code with OpenACC directives that provide addi
GPU Programming Requires a Different Mindset (Score:2)

by ImprovOmega ( 744717 ) writes:

I took some parallel processing classes in the last couple of years as part of my Master's program. CUDA was one of those tricky little beasts that basically takes a few minutes to learn (assuming a rock solid C/C++ background) but a lifetime to master the nuances.
We were building little throw-away matrix multiply programs - for which we were given horribly inefficient and barely functional source to start with. The challenge was to make it run as fast as possible, with extra credit going to the fastest i
- Re: (Score:2)
  
  by Skapare ( 16644 ) writes:
  
  You should already know how to do a matrix multiply by now, and not need someone else's source code. The task is to figure out how to partition the work most effectively for the GPU. Classic matrix multiply source code would be misleading at best.
  Or switch to an embarrassingly parallel project like Mandelbrot/Julia set calculation. Now the challenge is to make it do multi precision arithmetic so you can go deep.
- Re: (Score:2)
  
  by mpfife ( 655916 ) writes:
  
  | This runs counter to the level of abstraction that most CS majors are used to dealing with
  That's very unfortunate to hear. I know when I studied CS in the 90's, the foundation was always based on understanding the underlying hardware. My OS class focused on hardware interrupts, protected mode operation, cache and memory hierarchies. The whole basis for strategies and methods of making fast algorithms depends on knowing how the underlying hardware works.
  How can you call yourself a computer scientist
Easy and GPU programming (Score:2)

by mpfife ( 655916 ) writes:

Simply don't go in the same sentence. You inherently need to know a lot about the underlying hardware and programming models to take advantage of that hardware - and none of that is easy. Best advice? Maybe use C# and start with a good sample tutorial. After that, you're going to learn a lot more about image algorithms/etc. That's why I can still make amazing amounts of money knowing how to program for GPU's.
Harnessing GPU vs Learning GPU (Score:2, Interesting)

by Anonymous Coward writes:

Writing GPU programs is hard. Not only do you have to learn a new sets of APIs, you also have to understand the underlying architecture to extract decent performance. It requires a different approach to problem solving that requires months if not years to develop.
Fortunately you don't need to read the entire cuda programming guide to program on the GPU. There are several excellent libraries out there which hide the complexities of the GPU architecture. Since you are doing image processing, I would recommend
just learn cuda? (Score:2)

by SkunkPussy ( 85271 ) writes:

If you are an intermediate level programmer as you say then you can easily learn to use a new programming paradigm. There is a coursera course https://www.coursera.org/course/hetero [coursera.org] which is ok and should do for your purposes.
- Re: (Score:3)
  
  by Tr3vin ( 1220548 ) writes:
  
  Those are game engines. They will do nothing to help him use the GPGPU capabilities of his graphics card.
  - Re: (Score:1)
    
    by i kan reed ( 749298 ) writes:
    
    XNA has easy, painless shader compilation. You can plug a C# image class into an XNA texture, pipe it through a vshs shader that you write by hand, and dump the output to a texture, back to an image. That process is highly interoperable with existing C# applications.
    But that ignores the fact that Microsoft abandoned XNA like an unwanted child.
    - Re: (Score:2)
      
      by gl4ss ( 559668 ) writes:
      
      ms has a habit of abandoning one product and then other guys in the same fucking company forcing you to use xna.* libs on their brand spanking new hardware.
      but actually that sounds like a possible solution for the guy, the pain being writing the shader.
      silverlight abandoned? what the fuck are you doing shipping sdk with silverlight libs on almost the same fucking day?! I see though where elop learnt his trade.
    - - Make a successful PC game then get a publisher (Score:2)
        
        by tepples ( 727027 ) writes:
        
        The replacement is native code.
        I'm certain that experienced developers of mouse-driven games for Windows on PCs can still obtain Xbox One devkits through an accredited disc game publisher. Of course this requires you to conceive, implement, ship, and market a game in a mouse-driven genre to demonstrate your competence. And you'll need certain professional social networking skills, which don't come easily to people with some disabilities that correlate with programming skill, to negotiate with a publisher. B
  - Re: (Score:2, Informative)
    
    by Anonymous Coward writes:
    
    Incorrect. That is certainly a valid approach and the GP should be modded up.
    Using textures and shaders you can very easily do massively parallel floating point operations in XNA on the GPU, and it's a language the asker is familiar with.
    Think outside the box a little bit.
    - Re: (Score:3)
      
      by Tr3vin ( 1220548 ) writes:
      
      How many boxes do you want to go through before you get to the solution? Sure, he could write it as a shader, but that hardly requires pulling in something like Unity or XNA to build the project.
    - Re: (Score:2)
      
      by polymeris ( 902231 ) writes:
      
      <quote>... and it's a language the asker is familiar with.</quote> The asker is familiar with HLSL?
- Re:XNA or Unity (Score:5, Informative)
  
  by stewsters ( 1406737 ) writes: on Friday July 19, 2013 @04:40PM (#44331983)
  
  I don't think he is looking at making a game, I think he is looking for some cheap parallel processing. I have done some cuda, it was a pain to set up a few years back. There probably are better tutorials now.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by vlueboy ( 1799360 ) writes:
    
    Yeah, I know the feeling.
    It would be one more tool under my belt. For instance, most non-financial people hear of unemployment numbers and a few know where to view the official data. For some bizarre reason the government offers no graphs at dol.gov alongside their statistics, even though they let you download years worth of raw data. Enter us geeks, who easily put together a spreadsheet to make sense of official unemployment trends and zoom into the data all we want and run our won analysis.
    One day knowing
- - Re: (Score:2)
    
    by TechyImmigrant ( 175943 ) writes:
    
    I program in gates. The dark side runs on my gates.
- Re: (Score:2)
  
  by burisch_research ( 1095299 ) writes:
  
  Deprecated, and quite a while back at that. Do not use.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Check out MC# (Score:1)

GPU programming is pain (Score:5, Funny)

Re:GPU programming is pain (Score:5, Funny)

Re:GPU programming is pain (Score:4, Funny)

Re:GPU programming is pain (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Learn OpenCL (Score:5, Insightful)

Re:Learn OpenCL (Score:5, Interesting)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Learn OpenCL (Score:4, Funny)

Re: Learn OpenCL (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re:Learn OpenCL (Score:5, Insightful)

Re:Learn OpenCL (Score:5, Informative)

Re: (Score:3)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re:Learn OpenCL (Score:5, Informative)

Re: (Score:2, Informative)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2, Informative)

Re: (Score:2)

CUDA (Score:1)

Re: (Score:2, Insightful)

Re: (Score:2)

Re:CUDA (Score:5, Informative)

OpenACC (Score:1)

It's easier than it sounds (Score:1)

Obsidian (Score:5, Informative)

Re: (Score:2)

Re:Obsidian (Score:5, Informative)

Jitter (Score:1)

GPU programming *is* pain, princess. (Score:5, Informative)

Udacity teaches CUDA (Score:3)

OpenACC (Score:5, Interesting)

Re: (Score:3)

Re: (Score:3)

Very Similar Story (Score:2)

Proper approach to GPU programming (Score:2, Insightful)

Re: (Score:2)

C++ AMP (Score:2)

Coursera (Score:3)

Re: (Score:2)

Re: (Score:2)

OpenCV (Score:2)

Nothing easy but Udacity can help (Score:5, Informative)

Understand The Hardware (Score:3, Informative)

OpenACC or OpemMP 4.0 are exactly what you want (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Learn to Program an Intel Phi instead (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Do you need the GPU? (Score:3)

Re: (Score:2)

Re: (Score:2)

GPU Maven Plugin (Score:1)

OpenSceneGraph or OGRE (Score:2, Interesting)

Re: (Score:2)

Re: (Score:2)

Mary Hall at The University of Utah (Score:2)

Take it from someone who's done a lot of CUDA (Score:2)

Ask a neck-beard... (Score:2)

GPU programming is pain, princess. (Score:5, Informative)