Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Parallel Programming - What Systems Do You Prefer?

Posted by Cliff on Mon Oct 31, 2005 05:30 PM
from the better-computing-through-more-CPUs dept.
atti_2410 asks: "As multi-core CPUs are finding their way into more and more computer systems, from servers to corporate desktops to home systems, parallel programming becomes an issue for application programmers outside the High Performance Computing community. Many Parallel Programming Systems have been developed in the past, yet little is known about which are in practical use or even known to a wider audience, and which are just developed, released and forgotten. Or what problems the actual users of parallel programming systems bother the most. There is not even data on the platforms, that parallel programs are developed for. To shed some light on the subject, I have setup a short survey on the topic, and I would also very much like to hear your opinion here on Slashdot!" What Parallel Computing systems and software have you that really made an impression on you, both good and bad?
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Parallel Programming - What Systems Do You Prefer? 23 Comments More | Login /

 Full
 Abbreviated
 Hidden
More | Login
Keybindings Beta
Q W E
A S D
Loading ... Please wait.
  • I've found that parallel computing has provided an excellent means to avoiding obsolescence by allowing the creation of massive computers that have the potential to crush comparitively tiny, "modern" systems. While the prototype AppleCrate [aol.com] is just a small,

  • short answer (Score:5, Informative)

    by blackcoot (124938) on Monday October 31 2005, @06:08PM (#13919096)
    in order of easiest to hardest to program:

    uniform access shared memory (think the bigass (tm) cray machines) -- here you'd typically use mpi (if your programs are supposed to be portable) or the local threading library + vectorized / paralellized math libraries. since its all in a single memory space, it's "as simple" as just doing a good job multithreading the program.

    non-uniform access shared memory (think the large modern sgi machines) -- here things get a bit more subtle, because you're going to start caring about memory access and intranode communications. you can still get a reasonable measure of performance by just using threads, however, if your problem is "embarrassingly parallel enough".

    distributed memory (beowulf clusters and their ilk, although a bunch of regular linux or windows boxes will do) -- this is where things get excessively complicated very quickly. you have your choice of several toolkits (mpi being standard in the scientific world and superceding the previous pvm standard). here you are going to care a lot about communications patterns (in fact, probably more so than computation). i believe one of the java technologies (javaspaces perhaps? jini maybe?) abstracts this away and gives you the view of the network as a sea of computational power. regardless, you're going to have to pay very careful attention to how data moves because that will typically be your bottleneck. synchronization becomes whole orders of magnitude more expensive on this kind of parallel machine, which is another thing you'll have to figure into your algorithm design.

    once your architecture is fixed, you can start to talk about which toolkit to use. a well tuned mpi will work "equally well" in each of these environments and have the added bonus of being portable across architectures. mpich is a well respected implementation, although i found lam to be much easier to use, personally. good luck, i think you're about to open a can of worms only to discover that you've really just opened a can of ill tempered and rather hungry wyrms.
    • i suppose i should actually answer the question ;-)

      most of my parallel programming has been on commodity pc hardware (intel). as a result, i've used a combination of pthreads, compiler auto-vectorization (god bless intel's compiler), and mpi. for the more
    • Re:short answer (Score:2, Interesting)

      From some recent experiences with the mpiblast [lanl.gov] project, and some much older work at llnl [llnl.gov] I've had better experiences with mpich as being more reliable than lam (one man's limited opinion, a data point not a rule). Also I think it should be more clear that
    • uniform access shared memory (think the bigass (tm) cray machines)

      The only recent Cray machine that I am aware of that had uniform access to memory was the MTA2 which used an address scrambling scheme to spread references throughout the memory system

    • I've spoken with an SGI engingeer about the Altix systems and he said all the memory and communications considerations are automatically handled in hardware. The system will automatically pick the closest memory and processors to work together. Writing p
  • u++ (Score:5, Informative)

    by p2sam (139950) on Monday October 31 2005, @06:44PM (#13919393)
    We had to use this for school assignments way back when. It ain't bad. A lot more feature-ful than basic pthreads.

    http://plg.uwaterloo.ca/~usystem/uC++.html [uwaterloo.ca]
  • MPI, Co-Array Fortran, & UPC (Score:5, Informative)

    by Salis (52373) <(moc.liamg) (ta) (silas.drawoh)> on Monday October 31 2005, @07:07PM (#13919588) Journal
    MPI is the de facto standard for processor to processor communication with MPICH's implementation being the most stable and well known one. For "lower-level" communication, you can also use UPC or Co-Array Fortran, which are often used on serious computing architectures, like the Cray X1. The difference between MPI and these language-level parallel additions is that, on the language level, the transfer of data between processor looks like assignment between variables, where one of the dimensions of the variables includes the processor identities themselves.

    So, in MPI, to send data from processor 0 to processor 1, the 0 processor would call a function

    Call MPI_Send(dataout, datacountout, datatype, destination processor #, ...)
    (Fortran style)

    which must match an MPI_Receive in the processor 1's executing program.

    In Co-Array Fortran, OTOH, it would look like

    data[1] = data[0]

    The fun part about Co-Array Fortran is that 'data' can be defined as a regular multi-dimensional array so that data(1:10,1:20)[1] = data(40:50,60:80)[0] is perfectly ok _and_ the 'processor dimension', denoted by the []'s in Co-Array, can also be accessed using Fortran notation so that data[1:100] = data[0] is perfectly ok too. Or even data[2:2:100] = data[0] for only even processors.

    In truth, a Co-Array Fortran compiler will probably turn the language-level additions into MPI function calls (because that's the standard), but I find CAF to be more elegant than MPI.

    UPC is similar to Co-Array Fortran, but for C. I've never used it before, though.

    Google Co-Array Fortran or UPC for more information.
  • OpenMP (Score:2)

    I found this article on MSDN...

    http://msdn.microsoft.com/msdnmag/issues/05/10/Ope nMP/default.aspx

  • Multi core - "Parallel Computing" (Score:5, Informative)

    by Heretik (93983) on Tuesday November 01 2005, @01:06AM (#13921443)
    Making the jump from multi-core CPUs being available to massive things like clusters, MPI, etc. is a bit of a leap.

    Multi-core chips in a typical commodity machine (shared memory, same address space, etc) just means you have multiple threads of execution, but everything else is pretty much the same at the application coding level.

    If you're working on an app and want to take advantage of multi-core (or SMP), you just need to have a well threaded app, using the native threading libs (ie pthreads) - nothing fancy. Clusters and big non-shared-memory type supercomputers are a different story altogether from something like a dual-core Athlon.
  • Too bad that Cray retreated to more conventional designs.
    • actually, speaking as someone who was involved in later
      cray products, sgi killed the t3e.

      the merger agreement with tera specifically constrained
      cray from making a followon machine.

      not that cray doesn't have problems....
  • http://labs.google.com/papers/mapreduce.html

    "MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key
  • Prolog is quite easy to use and is inherently paralell, that why they chose it for the 5th Generation computing project [wikipedia.org] that never really got anywhere (possibly because it was way before it's time)
  • If you have data that can be incrementally processed, then shell scripts with pipes can bring about a high degree of speedup
    process1 | process2 | process3
    with all three of them running on different processors means that your program can get up to a 3x spee