Why Does Current Clustering Require Recoding?

Catch up on stories from the past week (and beyond) at the Slashdot story archive

Why Does Current Clustering Require Recoding? 75

Posted by Cliff on Tuesday September 13, 2005 @05:07PM from the reinventing-the-wheel dept.

AugstWest asks: "I've been doing some research into what the available clustering options are for pooling CPU resources, and it looks like most of the solutions I've found require that programs be re-written to take advantage of the cluster. Since there are virtualization apps like Bochs and VMWare, where the applications just make use of a virtual CPU as if it was a real CPU, why aren't there clustering solutions that do this as well?"

This discussion has been archived. No new comments can be posted.

Why Does Current Clustering Require Recoding?

Load All Comments

Search 75 Comments Log In/Create an Account

Comments Filter:

latency? (Score:5, Insightful)

by Johnny Mnemonic ( 176043 ) writes: <mdinsmore@NoSPaM.gmail.com> on Tuesday September 13, 2005 @05:13PM (#13551021) Homepage Journal

why aren't there clustering solutions that do this as well?
Because it's a lot faster to address a local CPU than it is to send that info down the wire to a remote CPU? And because of that latency, it's a lot easier to keep 2 or more local CPUs in sync than it is to keep 2 or more remote CPUs in sync?
You need to recode because you want to work around the latency, which is severe, of working via a network cable--so you design your apps to minimize messaging between CPUs. Some apps can do this well--they don't need results from other CPUs to complete their own information.
Other applications require CPUs to work in tandem, and for each CPU to have to wait while the results are served out over GigE would suck some serious ass, even if it might be technically possible.

Share
twitter facebook
- Re:latency? (Score:3, Informative)
  
  by Frumious Wombat ( 845680 ) writes:
  
  Don't forget disk access issues as well. You now have file locking, non-local disk-access, and race state issues to contend with.
  
  Example from my work is that we tend to write several hundred meg to several gig scratch files, and then perform RW operations on them continually during a calculation. If the disk isn't local to the process, then you end up flooding the network, and bringing everything to a screeching halt.
  
  In a Mosixish/Condor type environment, you then have to deal with which processes,
- No, the hard part is ... (Score:3, Interesting)
  
  by hummassa ( 157160 ) writes:
  
  cache consistency. When I modify a page that is in my processor cache, now I have to put the word out to the whole network -- and I can't really commit that page until I know for sure that other threads in the cluster did not modify the same page (and, in the case someone did, I must decide how do I merge their modifications and mine, notify them of the merging, etc, etc...) What was a quick (important for performance) operation becomes a dog-slow operation, and maybe puts the whole motif for using a cluste
Performance (Score:3, Insightful)

by RAMMS+EIN ( 578166 ) writes: on Tuesday September 13, 2005 @05:18PM (#13551054) Homepage Journal

``Since there are virtualization apps like Bochs and VMWare, where the applications just make use of a virtual CPU as if it was a real CPU, why aren't there clustering solutions that do this as well?''

Because it's virtualization, and thus hurts performance?

Share
twitter facebook
- Qemu Virtualization (Score:2)
  
  by phorm ( 591458 ) writes:
  
  Indeed, I've been using Qemu a lot lately. While it's great for my needs, it emulates a lower-speed P2 CPU (on my P4 machine) and requires that any device hooks also be understood and passed-through/translated by the virtual machine.
  
  In the end, you'll get better performance and compatability out of coding for a cluster, rather than having the overhead and redirection of the virtualization process.
Because it's hard. (Score:3, Insightful)

by Elwood P Dowd ( 16933 ) writes: <judgmentalist@gmail.com> on Tuesday September 13, 2005 @05:18PM (#13551055) Journal

It's hard to take arbitrary code and decide which parts can be run on opposite ends of a network cable.

Sure, you could make a clustering application that would run arbitrary x86 code on separate machines, but it would be many orders of magnitude slower than just running the code on one big Xeon.

Hell, it's hard enough to take a single thread and spread work across multiple execution units in the CPU for out-of-order execution, and too hard to do it across multiple CPUs in a single box. Why would it be possible across a network cable? Have I completely misunderstood the question?

Share
twitter facebook
cooking lessons (Score:4, Insightful)

by xutopia ( 469129 ) writes: on Tuesday September 13, 2005 @05:20PM (#13551076) Homepage

imagine telling a group of 10 cooks to make a huge roast. You wouldn't cut the roast in 10 pieces and each make them cook it seperatly and then glue the pieces back together. It would be highly difficult to glue back the 10 pieces. Instead it would make more sense to ask all cooks to do a seperate tasks. A few could could cut vegetables while another would make a sauce, another a salad.
As it stands today, an OS cannot easily share tasks. But there exists some tasks which are more easily shareable than others. I imagine within a century we'll be able to share tasks more easily and I think the CELL chip is meant to ease this transition but I could be wrong.

Share
twitter facebook
- Re:cooking lessons (Score:2)
  
  by fm6 ( 162816 ) writes:
  
  It would be highly difficult to glue back the 10 pieces.
  
  Gluing a roast back together is easy. It's eating it aftwards that's hard!
- Re:cooking lessons (Score:4, Insightful)
  
  by swmccracken ( 106576 ) writes: on Tuesday September 13, 2005 @07:07PM (#13552069) Homepage
  
  See, is this High Availablity clustering or performance clustering. The asker doesn't state, and it's a rather important distinction.
  
  If it's HA, you'd get 10 cooks each to make a roast. Sure, you'd end up with cooking extra meat but that doesn't matter - the goal here is to guarentee that a roast will be cooked no matter what. (I can imagine two copies of bochs running on seperate physical machines but linked to run in absoulte lock-step. Performance might be impared, but relability will be there.)
  
  If it's performance, then you're right, you can't magically glue two computers together and get twice the performance.
  
  Parent Share
  twitter facebook
- Your example... (Score:2)
  
  by PaulBu ( 473180 ) writes:
  
  ... reminds me of Cray's famous statement that though one woman can have a child in 9 month, nine women would not be able to have one in 1 month. ;-)
  
  Yes, all the answers above were quite sufficient to explain why you have to re-code your app if you want for it to run _faster_ when you add _more_ nodes. And it is so easy to make it run slower -- I bet the original poster would benefit from trying to re-code some a sequential program to a parallel one at least once, then ask himself "How the heck can I teach
  - Re:Your example... (Score:1)
    
    by Tower ( 37395 ) writes:
    
    Here's the ARS Technica article [arstechnica.com] on the HP Dynamo tech.
  - Re:Your example... (Score:1)
    
    by jgrahn ( 181062 ) writes:
    
    ... reminds me of Cray's famous statement that though one woman can have a child in 9 month, nine women would not be able to have one in 1 month.
    I've never seen that one attributed to anyone but Fred Brooks. (But it's still funny.)
part of the issue (Score:4, Informative)

by sfcat ( 872532 ) writes: on Tuesday September 13, 2005 @05:21PM (#13551084)

When making an application distributed, you must figure out how to replicate the memory the application uses to other machines and make sure that this replication and synchronization work is transparent to the logic of the application. But this replication and synchronization is far far far too expensive (computationally) if done naively. So either special system calls (which is what the recoding requires) or a redesign of how work is parcelled out to worker threads is necessary.
This is in addition to the handling of resources such as database connections and other shared resources across the distributed cluster. I'm not exactly sure what your specific needs are but when you separate threads across different physical memory spaces, it creates significant problems to overcome. If you just want to virtualize the application (so one machine, many virtual machines, one physical memory), then the recoding should be trivial. And I agree, in this isolated case, no recoding should be necessary. But most of the time, clustering entails spaning multiple physical memories, and thus the application needs to be designed to handle these difficulties.

Share
twitter facebook
- Infiniband (Score:2)
  
  by jd ( 1658 ) writes:
  
  This is where infiniband comes into play, as it has built-in support for distributed direct memory access and caters to sufficient bandwidth to support it. The problem is the questioner doesn't state what sort of interconnect they're using - and that matters in a cluster. Ethernet, SCI, Infiniband, etc, all support different types of solution.
  You can only use solutions that exist for the technology you're using. Likewise, though, you're not limited by constraints on technologies you're not using.
Mosix (Score:5, Insightful)

by NitsujTPU ( 19263 ) writes: on Tuesday September 13, 2005 @05:22PM (#13551089)

You might want to try Mosix.

http://www.mosix.org/ [mosix.org]

Share
twitter facebook
- openMosix (Score:2, Informative)
  
  by Codename_V ( 813328 ) writes:
  
  Actually, I'd recommend openMosix [sf.net]. Granted Mosix is the original and is open source now as well, but it still seems like openMosix is more actively developed.
  - MOSIX License (Score:3, Informative)
    
    by Noksagt ( 69097 ) writes:
    
    Actually, I'd recommend openMosix.
    Agreed.
    Granted Mosix is the original and is open source now as well,
    Not by OSI/DFSG/FSF standards. The license [mosix.org] is still very restrictive. I think the kernel patches might be under GPL, but certainly not the user tools.
    it still seems like openMosix is more actively developed.
    This is certainly true. Most talent jumped ship & openMosix does have a higher number of active developers (and is somewhat backed by AMD (though I think AMD can and should give more developers to
- Re:Mosix (Score:1)
  
  by NitsujTPU ( 19263 ) writes:
  
  It's funny. This is the system that the guy wants. This is what he's looking for. It's not an "why not" answer, it's a system that does what he wants.
  
  Not a single moderator gave me a point for this. Not to whine, but what kind of a clue bat does it take to get the CORRECT answers modded up at Slashdot?
  - Re:Mosix - a great answer, but not for everything. (Score:2, Insightful)
    
    by ancientt ( 569920 ) writes:
    
    I don't think its quite as simple as a right answer. Sure, openMosix rocks but its only one kind of answer, not the final one. OpenMosix spreads the processes around but can't split a single process up to make it complete faster. It can send processes to the most likely CPU but that still doesn't address the question of speeding up the time that the process will take to complete.
    Beowulf clusters typically are designed for specific purposes and software is written to take advantage of the design. You can't
    - Re:Mosix - a great answer, but not for everything. (Score:2)
      
      by bentini ( 161979 ) writes:
      
      "You can't have two computers add 2+2 any faster than you can have one computer do it. You can however, have two computers adding 2+2 and 0+1+1 at the same time to get two answers in half the time it would take one computer to do it."
      
      Maybe not for 2+2, but you could for large numbers which are not atomic to add. If it takes linear time to do a task on one processor, on the Connection Machine it could basically end up being lg n time.
      
      Cf. here [stanford.edu]
TANSTAAFL (Score:5, Insightful)

by Julian Morrison ( 5575 ) writes: on Tuesday September 13, 2005 @05:23PM (#13551114)

Clustering exposes complications regarding: shared data, latency, concurrency, transactions, central control, security, failovers, and so forth. It's hard because it's hard.

Share
twitter facebook
- Re:TANSTAAFL (Score:1)
  
  by GigsVT ( 208848 ) writes:
  
  It's hard because it's hard
  
  Ask Slashdots like this one, and succinct replies like this one make me wish there was an option to immediately archive a story with just one comment if enough people vote for it. :)
- Re:TANSTAAFL (Score:1)
  
  by sumirati ( 639201 ) writes:
  
  Clustering exposes complications regarding: shared data, latency, concurrency, transactions, central control, security, failovers, and so forth.
  
  I'm very happy that my clusters don't require forth.
duh (Score:3, Informative)

by jpmkm ( 160526 ) writes: on Tuesday September 13, 2005 @05:23PM (#13551115) Homepage

How is this magical cpu virtualizer going to know what it can split up and send to different computers? Like another poster mentioned, latency is the big issue. If your cpu virtualizer arbitrarily sends instructions over the network to other nodes, but the original program still expects them to be executed at local cpu speed then things are going to get fucked up fast. I wouldn't be surprised if the final result is actually slower than just running the job on one box.

Basically, what's wrong with this idea is the clustering software has no way of knowing what it can chunk up and spit out to other nodes unless the programmer of the software in question tells it. Some multithreaded programs can be run on clusters without a rewrite, but there is already clustering software for that application. What the OP is suggesting is similar to rerouting highway traffic by arbitrarily plucking cars off the highway and putting them on random side streets. They all may get there eventually and, at first, it may seem like they are moving faster, but in the end it just takes everyone a lot longer to reach their destination. Now, if the drivers themselves planned alternate routes to help alleviate congestion on the highways, then there's a good chance everyone would get to their destinations faster.

Share
twitter facebook
- Re:duh (Score:2)
  
  by Everleet ( 785889 ) writes:
  
  No, it's more like chopping up a bus, leaving one passenger with each piece, and telling them all to drive to the same destination. Unless the problem actually is parallelizable, in which case you'd tell them to drive to different destinations.
  
  Then again, it's even more like chopping up the highway into short, disconnected, side-by-side pieces, giving the group some number of go-karts (depending on the problem), and telling each person to drive down a different piece than they started on.
openMosix (Score:2)

by Noksagt ( 69097 ) writes:

openMosix [sourceforge.net] is a GPLed fork of MOSIX [mosix.org] & is undergoing rapid development. No need to recode apps. Apps will work like they do in an SMP machine. So, if they are already faster on a dual-processor machine, they won't need to use special libraries or threading methods to work over several workstations.
- Re:openMosix (Score:2)
  
  by samjam ( 256347 ) writes:
  
  This is true;
  
  but unless your app is heavily CPU bound it will probably stick to its home node.
  
  If the app does much io, like say, processing batches postscript files, it will probably stay in its home node maybe unless you manage to get global block devices working to convince mosix that the io is as good from anynode.
  
  Mosix sounds good because you don't have to "do" anything special but most apps won't benefit from it.
  - Re:openMosix (Score:3, Informative)
    
    by Noksagt ( 69097 ) writes:
    
    I agree that you may have to make some of these kinds of design changes to benefit for one application processes. But you'd really have to make those if you use other clustering solutions too. With Mosix, you don't have to make the kind of implementation-specific changes, though.
    
    (And, for your particular example, mosix has a number of schedulers & you can schedule manually. You can trivially send one postscript file to each node. Of course you can do this "braindead" clustering with a script, but it
    - Re:openMosix (Score:2)
      
      by AugstWest ( 79042 ) writes:
      
      Well, the reason I'm asking is that I'm starting a job in a Math department at a University, where I'll have too much free time and lots of underused Linux machines around. They do a lot of very cpu-intensive work, and I was thinking it'd be a fun learning project to put something together.
      - Re:openMosix (Score:2)
        
        by NitsujTPU ( 19263 ) writes:
        
        Mosix is a quick solution. It's not going to get you the most bang for your buck if you're trying to develop some amazing supercomputer, but it will certainly get you the biggest bang for your buck if you're interested in something you could put together in an afternoon and pump some good performance out of.
Wrong problem. (Score:2)

by Mr2cents ( 323101 ) writes:

More processors don't always increase speed, you have to be able to split up the problem in chunks and then work on them at the same time. The algorithms that simulate a processor aren't easily run in parallel, basically. Or require too much communication overhead.
Compilers (Score:4, Interesting)

by Marillion ( 33728 ) writes: <ericbardes@gm[ ].com ['ail' in gap]> on Tuesday September 13, 2005 @05:30PM (#13551170)

Most compilers/interpreters support languages designed for single thread execution. Fortran, COBOL, C, C++, Ruby, Perl, PHP, Java, ... Sure all these have API calls to make use of multiple threads, but the language itself isn't multi-threaded.
In my shameless search for a site to cite, I found this http://www-unix.mcs.anl.gov/dbpp/ [anl.gov] which covers lots of problems that have to be solved.
I'd love to see a language (or language extension) cleanly define a way to let me define a code block attributes which could affect how and where it gets executed. The runtime library could then distribute that block as the environment best allows.

Share
twitter facebook
- Re:Compilers (Score:3, Informative)
  
  by The boojum ( 70419 ) writes:
  
  I'd love to see a language (or language extension) cleanly define a way to let me define a code block attributes which could affect how and where it gets executed. The runtime library could then distribute that block as the environment best allows.
  
  Have a look at OpenMP [openmp.org]. Granted, it's more for shared-memory systems than clusters, but it works similiarly to what you describe.
  - Re:Compilers (Score:1)
    
    by Shewmaker ( 28126 ) writes:
    
    I've never tried it, but there is a project that claims to have an OpenMP implementation that runs over distributed shared memory.
    
    Omni/SCASH: Cluster-enabled Omni OpenMP on a software distributed shared memory system SCASH [phase.hpcc.jp]
- Re:Compilers (Score:2)
  
  by blackcoot ( 124938 ) writes:
  
  a nit to pick: java has threading built in as a primitive. this is something java takes advantage of when using javaspaces, jini, and jxta. that said, you still suffer the issues imposed by network topologies in a distributed memory multiprocessor.
  - Re:Compilers (Score:3, Interesting)
    
    by Marillion ( 33728 ) writes:
    
    No. Java is a perfect example of API based threading and in Java it's easy to do. Still a class has to implement Runnable and the programmer has to create a Thread and start it.
    The synchronize keyword was closer to where I was going. Suppose Java had a thread modifier keyword for looping operators. You could then:
    
    public void renderImage(Image images[]) { thread for (int i = 0; i < images.length; i++) { render(images[i]); } }
    
    each iteration of the looping block launches as a different task running i
    - Re:Compilers (Score:2)
      
      by stoborrobots ( 577882 ) writes:
      
      I remember seeing something very similar to what you describe... A little poking around brings me to this page about implementing image-processing in hardware [embedded.com], (originally seen on robots.net [robots.net]).
      
      They talk about OpenMP [openmp.org], (as The Boojum mentioned) and they use it in a way analogous to what you're describing there... an example: (Damnit... slashcode fuxors up the indenting...)
      
      Listing 4: Implementation of replication sort
      1 par (element=0; element<SIZE; element++) { 2 seq { 3 par (element2=0; element2&
      - Listing 4: Implementation of replication sort (Score:2)
        
        by stoborrobots ( 577882 ) writes:
        
        Listing 4: Implementation of replication sort 1 par (element=0; element<SIZE; element++) { 2 seq { 3 par (element2=0; element2<SIZE-1; element2++) { 4 ifselect(element>element2) { 5 if(uList[element] > uList[element2]) 6 comp[element][element2] = 1; 7 } else ifselect (element<=element2) { 8
- Re:Compilers (Score:1)
  
  by MarkLewis ( 593646 ) writes:
  
  Well, this is what J2EE is supposed to accomplish with EJB. Of course in previous versions EJB implementations have suffered from terrible performance and terrible code complexity. Supposedly both have gotten better, but I haven't dared to look at them again yet.
- Re:Compilers (Score:4, Informative)
  
  by GileadGreene ( 539584 ) writes: on Tuesday September 13, 2005 @10:41PM (#13553594) Homepage
  
  I'd love to see a language (or language extension) cleanly define a way to let me define a code block attributes which could affect how and where it gets executed.
  The venerable occam [wotug.org] programming language requires that each block of code be specifically identified as being executable either in parallel or sequentially. Since PAR and SEQ constructs can be nested it is easy to build up quite complex concurrent structures that can easily be distributed. Since the semantics of occam processes are derived from Hoare's CSP process algebra the compositional nature of occam's parallelism is theoretically sound, and avoids many of the problems associated with thread-based concurrency model that most people are familiar with.
  
  Parent Share
  twitter facebook
- Re:Compilers (Score:2)
  
  by hackstraw ( 262471 ) * writes:
  
  Take a look at OpenMP [openmp.org].
  
  It takes a commercial compiler, but its straight forward, an open specification, can be used "automagically", its portable across machines and languages.
  
  It does not work on a clustered system, but only one that has local processors and memory.
What type of cluster do you want? (Score:3, Informative)

by Hast ( 24833 ) writes: on Tuesday September 13, 2005 @05:30PM (#13551172)

First off, it's not entirely clear what you want to do with it. If you want load balancing then that's one problem. If you want parallel batch processing (such as rendering farms or compiling) then that's another problem. And for the really juicy stuff, ie running a normal application distributed on multiple computers then that is a third, and very different problem.

But all of them require that you add something to the original program which distributes the work (load balancing/render farms). If you want your original program to run in parallel then that is a much harder problem to solve. Basically you'll have to remake it into something like the above.

The last problem would basically require the computer to extract threads out of your code. This is pretty much impossible to do automatically though.

Share
twitter facebook
- Re:What type of cluster do you want? (Score:2)
  
  by Reverend528 ( 585549 ) writes:
  
  Server applications can be load-levelled across a cluster without modification if the load-balancing is done on the process level. Apache has no problems load-balancing on an openssi cluster, since the individual processes don't depend on eachother.
  - Re:What type of cluster do you want? (Score:2)
    
    by Hast ( 24833 ) writes:
    
    Yes but you still need an application doing the actual load-balancing. That may be built in or not; but it still has to exist.
    
    And I imagine that if you have a complex site with dynamic content from a database you'll still need to optimise your design to get as much use as possible from the load balancing.
What you want... (Score:2)

by jd ( 1658 ) writes:

Is a clustering system that openly moves applications around [openmosix.org], and then allows shared memory [codito.com] to be distributed, along with devices [sourceforge.net] and allows applications to move around [mobile-ipv6.org] yet retain their network connections [linux-ipv6.org].
Oh, that sounds like a tough one to me. Ok, ok, it's actually not that tough - but it DOES require combining a number of kernel patches, there's no one-stop-shop (at the moment) for this. It also requires that network connections be IPv6, as there's bugger all mobility support out there for IPv4 for Linux
- Re:What you want... (Score:2)
  
  by aminorex ( 141494 ) writes:
  
  There's Rocks [psu.edu] and Racks [umass.edu].
Because bandwidth is scarce. (Score:3, Interesting)

by roystgnr ( 4015 ) writes: <roy&stogners,org> on Tuesday September 13, 2005 @05:35PM (#13551232) Homepage

If your problem is so parallelizable that bandwidth isn't a limitation, then you don't need any special clustering software, you just need nfs and ssh: I do all my compiling in a flash with a short script and "make -j 16 CXX=sshcxx".

If your problem isn't that parallelizable and yet you need a whole cluster of computers to run it, odds are you need more efficiency than distributed shared memory can give you. You can access memory on your own node with orders of magnitude more bandwidth and less latency than on other nodes, and if your application doesn't take that into consideration it can run orders of magnitude slower.

Of course, that doesn't apply to every problem, and there are people trying to create exactly the cluster-as-computer architecture you'd like to see for ease of application programming. Check out OpenMosix and MigShm for one example - I haven't used the latter DSM patch myself but I know that for non-shared-memory programs, Mosix has had working process migration code for years.

Share
twitter facebook
- Re:Because bandwidth is scarce. (Score:1)
  
  by versus ( 59674 ) writes:
  
  I do all my compiling in a flash with a short script and "make -j 16 CXX=sshcxx".
  
  How about distcc [samba.org]? No need for NFS.
This is a basic systems question. (Score:5, Informative)

by stienman ( 51024 ) writes: <adavis&ubasics,com> on Tuesday September 13, 2005 @06:20PM (#13551636) Homepage Journal

This is a basic systems question:

[Why must] programs be re-written to take advantage of the cluster.

The simple answer is that programs, in general, are written as single threaded applications with shared state (memory). A cluster is the opposite of that - multiple parallel CPUs without shared state (or at least requiring one to be explicit about shared state, as opposed to simply declaring a variable).

Usually a program algorithm has to be completely re-designed in order to take advantage of the cluster, while mitigating the problems. At minimum the program must be parallelized. If you don't change the program to succesfully deal with shared memory latency then the cluster becomes nearly as powerful as a single fast computer running the program.

The reason you are asking this question is that you don't realize that a cluster is fundamentally different than a single (or dual or quad) CPU. The architecture is completely different. You can't expect to treat it like any old computer.

-Adam

Share
twitter facebook
- mod parent up (Score:2)
  
  by blackcoot ( 124938 ) writes:
  
  you managed to communicate what i was attempting to in a far more succinct way than i managed to.
  - Re:mod parent up (Score:2)
    
    by AugstWest ( 79042 ) writes:
    
    I would give it points, but I guess I can't, since I asked the questoin :] Thanks for a great answer.
    - Re:mod parent up (Score:2)
      
      by blackcoot ( 124938 ) writes:
      
      intro. to parallel computation is known as one of the nightmare master's level classes (the sort of class that most folks only took because they were either a] utter masochists, b] far too clever, or c] didn't have any other choices) at my alma mater. i kinda fell into all three of those categories and ended up doing pretty well. the question you asked is one that my prof posed to us our first night of class and one that came back to haunt us almost every class thereafter (and certainly on the final)...
      
      anyw
What about the mythical TOE and RDMA (Score:1)

by mkop ( 714476 ) writes:

I have been hearing about TOE http://en.wikipedia.org/wiki/TCP_Offload_Engine [wikipedia.org] and RDMA http://en.wikipedia.org/wiki/Remote_Direct_Memory_ Access [wikipedia.org] for over a year now both of these would help with clustering of remote servers and there CPU's
- Re:What about the mythical TOE and RDMA (Score:2)
  
  by cant_get_a_good_nick ( 172131 ) writes:
  
  TOE would help free up some cycles on an individual machine, but I'm guessing the question doesn't ask that. It's a little unclear, but I'm guessing it's "If already I can hit a Virtual machine, why can't I code to a virtual machine who's real machine happens to be the cluster?"
  
  Generally it looks like he's asking for a VM to be an abstraction layer over a cluster. The problem is abstractions are simplifications, and you can't just simplify away the real problems of a cluster. There are some solutions tha
VirtualIron is probably what you're looking for (Score:2)

by hansendc ( 95162 ) writes:

VirtualIron [virtualiron.com] is a company/product that runs a para-virtualized Linux instance (more similar to Xen than Bochs or the desktop VMWare) which spans multiple physical machines.

http://www.virtualiron.com/ [virtualiron.com]
My try (Score:3, Interesting)

by fm6 ( 162816 ) writes: on Tuesday September 13, 2005 @06:48PM (#13551885) Homepage Journal

Lots of good answers, but none that quite satisfy me. Here's mine:
The virtual machines you mention all run on a single existing system. You want a virtual machine that runs on multiple systems. That goes way beyond what the existing VMs do. They just implement the hardware instructions of a single system in software running on a single system. Taking that implementation and spreading it out among multiple systems means anticipating every clustering problem the code might raise, and solving it in advance.
Nobody knows how to do that. If they did, they'd implement it as the back end of compiler rather than waste the overhead of using a VM.
(They say that there are no stupid questions. Not true. But there are lame stupid questions, and interesting stupid questions. My vocation [picknit.com] is answering interesting stupid questions, which is why I'm grateful for this one!)

Share
twitter facebook
why don't you do a little experiment? (Score:2)

by blackcoot ( 124938 ) writes:

pick an algorithm (say matrix multiply). write the fastest possible serial implementation you can (hint: you can do better than O(n^3)). then implement a parallel matrix multiply using MPI. now make the parallel one run as fast as possible.

i can guarantee that the serial algorithm is about a day's worth of effort to implement; however, the parallel one will require at least a week. as you start working through the parallel implementation, you'll quickly discover that all the things that are true in a shared
Holy Grail (Score:4, Insightful)

by owlstead ( 636356 ) writes: on Tuesday September 13, 2005 @07:18PM (#13552173)

This will be a bit difficult to explain fully. The other posts have already lightly touched the problems involved (especially latency). But you are talking about the holy grail of parallel computing here; seeing one system while it is running all over the place. My best advice for you is to get a good book on parallel systems and get educated. This is something like asking a doctor why there are still diseases.

Share
twitter facebook
Your language doesn't support it (Score:4, Informative)

by photon317 ( 208409 ) writes: on Tuesday September 13, 2005 @08:37PM (#13552837)

The only way you'll have source code that compiles and runs unmodified on architectures of widely varying parallelism efficiently is for the language itself to know about parallelism, and make it the compiler's (and even runtime-linker and kernel's) job to parallelize your code for you. An inherently parallel language would have ways for you to specify in your source code what can and cannot be executed in parallel, and what code absolutely depends on the serial execution of some previous code. Even then, we're really only talking about the SMP case. When you start involving network latencies and bandwidth restrictions, the decisions on when and how to parallelize become more challenging for the compiler/runtime, possibly requiring either more intelligence on its part and/or more meta-information in your source code.

Until you write code in a language like that, you can never expect to write code in a single-threaded mindset and then have it just magically take advantage of a parallel environment.

Share
twitter facebook
completely different issues (Score:1)

by __aazofn1209 ( 775096 ) writes:

Clustering is a many-cpu, one-problem situation. Many problems are not "do this thing 1000 times", but "perform these 1000 steps in order", so it requires a lot of work to make the simultaneous availability of many CPUs an advantage. The goal is to increase the speed of a CPU-intensive task.

Virtualization is a many-problem, one-cpu situation. Various software tricks make each of several programs think they have an entire system to themselves. In reality it all runs inside a virtualizer/emulator. Speed
VMware doesn't virtualize the CPU (Score:1)

by Torrenc ( 669412 ) writes:

VMware doesn't virtualize the CPU. Whatever native CPU you're running is what the Guest OS sees.
- Re:VMware doesn't virtualize the CPU (Score:2)
  
  by jbridge21 ( 90597 ) writes:
  
  Dude, that's what "virtualization" means in the context of recent computing... "emulation" is where you emulate the entire thing ALA Bochs. "Virtualization" you just trap system code and hardware access and redirect them.
cilk (Score:1)

by Shewmaker ( 28126 ) writes:

You should check out cilk [mit.edu]. Two of the people behind the project used to work for Thinking Machines, the company that made one of the best supercomputers of its day. Cilk adds a few key words to C and it requires much less effort than most other parallel programming models. Unfortunately, the distributed version of it was only a prototype and isn't included in the latest release. If nothing else, you should read the papers these guys have written.

They also had a couple of graduate students that made Jilk
The halting problem (Score:2)

by theLOUDroom ( 556455 ) writes:

Wouldn't being able to partition any normal program into a program that executes (efficiently) on multiple CPU's basically require that someone solves the halting problem? [wikipedia.org]

I bring this up because because it seems like, in order to partition the tasks efficiently, you'd basically need to be able to predict what the program was trying to do in advance.... and if you could predict what a program was going to do in advance of actually running it, it would seem like you have just solved the halting problem.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

latency? (Score:5, Insightful)

Re:latency? (Score:3, Informative)

No, the hard part is ... (Score:3, Interesting)

Performance (Score:3, Insightful)

Qemu Virtualization (Score:2)

Because it's hard. (Score:3, Insightful)

cooking lessons (Score:4, Insightful)

Re:cooking lessons (Score:2)

Re:cooking lessons (Score:4, Insightful)

Your example... (Score:2)

Re:Your example... (Score:1)

Re:Your example... (Score:1)

part of the issue (Score:4, Informative)

Infiniband (Score:2)

Mosix (Score:5, Insightful)

openMosix (Score:2, Informative)

MOSIX License (Score:3, Informative)

Re:Mosix (Score:1)

Re:Mosix - a great answer, but not for everything. (Score:2, Insightful)

Re:Mosix - a great answer, but not for everything. (Score:2)

TANSTAAFL (Score:5, Insightful)

Re:TANSTAAFL (Score:1)

Re:TANSTAAFL (Score:1)

duh (Score:3, Informative)

Re:duh (Score:2)

openMosix (Score:2)

Re:openMosix (Score:2)

Re:openMosix (Score:3, Informative)

Re:openMosix (Score:2)

Re:openMosix (Score:2)

Wrong problem. (Score:2)

Compilers (Score:4, Interesting)

Re:Compilers (Score:3, Informative)

Re:Compilers (Score:1)

Re:Compilers (Score:2)

Re:Compilers (Score:3, Interesting)

Re:Compilers (Score:2)

Listing 4: Implementation of replication sort (Score:2)

Re:Compilers (Score:1)

Re:Compilers (Score:4, Informative)

Re:Compilers (Score:2)

What type of cluster do you want? (Score:3, Informative)

Re:What type of cluster do you want? (Score:2)

Re:What type of cluster do you want? (Score:2)

What you want... (Score:2)

Re:What you want... (Score:2)

Because bandwidth is scarce. (Score:3, Interesting)

Re:Because bandwidth is scarce. (Score:1)

This is a basic systems question. (Score:5, Informative)

mod parent up (Score:2)

Re:mod parent up (Score:2)

Re:mod parent up (Score:2)

What about the mythical TOE and RDMA (Score:1)

Re:What about the mythical TOE and RDMA (Score:2)

VirtualIron is probably what you're looking for (Score:2)

My try (Score:3, Interesting)

why don't you do a little experiment? (Score:2)

Holy Grail (Score:4, Insightful)

Your language doesn't support it (Score:4, Informative)

completely different issues (Score:1)

VMware doesn't virtualize the CPU (Score:1)

Re:VMware doesn't virtualize the CPU (Score:2)

cilk (Score:1)

The halting problem (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals