Ask Slashdot: How do Software MMU's Work? 92
Rob_D_Clark
asks:
"How does a program (like VMware) implement
memory management on top of Linux (or Unix
in general)? For example: in VMware, the
guest OS is going to expect to have a 32-bit
address space, into which the memory you
allocate to the guest OS is mapped. Also,
the guest OS is going to expect hardware
registers for different devices, etc., to
be mapped in at certain addresses. How
does a program trap reads/writes to these
addresses and deal with them appropriately?"
Yes, engines work that way, too! (Score:1)
Funny, my dad told me that's how an 'internal combustion engine' works.
My dad explained "marginal utility" and "network externalities" the same way. In fact, he used the same explaination for sex.
Hmmm . . .
I hope your entirely humorless post . . . (Score:1)
. . . is a parody of the humorless mental dwarves who moderate all the life out of Slashdot.
Lighten up.
Re:Ass-talking (Score:1)
I believe the main problem is that opcode to ask the cpu what mode (user or supervisor) it is in is not virtualizable, so it isn't like you will need to be recomputing things like offsets into the stack and complicated stuff like that, just a binary search and replace.
Re:Ass-talking (Score:1)
Speculation: Virtualize the x86 (Score:1)
My program (VMware work-alike) would run in ring 0. ALL other programs would run in ring 3. I'd set a seperate x86 VM for each instance of a hosted operating system. My work-alike program would emulate any instructions which can't be run in ring 3. The work-alike would have to emulate a lot of the functionality of the x86.
When a program tries to use an instruction that can't be used in ring 3 a GPF will occur which will switch control to my work-alike. My work-alike will look at the stack to get the address of the offending instruction and emulate its functionality, then return control back to the program. The program would have no idea it was just interrupted and its request emulated as long as it gets what it wants.
It'll be much more involved than the above and would require a lot of work to write, but thats basically how it could be done.
Anonymous Coder
Re:Virtual Memory (Score:1)
Re:Instructions I found in vmmon (Score:1)
Re:Funky Walking (Score:1)
As others have pointed out, the i386 is NOT virtulisable, so you have to play some tricks, unless you want to emulate the processor (hey, it worked for insignia). But, that is too slow to get VMware's level of performance. Even digital's assembly language translation thingie (the had a vapor ware alpha/i386 hybid processor a few years ago that did this -- look in back issues of byte for pointers) is too slow.
VM ware's trick is to scan the object code at load time and translate unvirtulisable instruction sequences to something else -- what I don't know, but I suspect they jump to an emulator for just that sequence. So it's just like Pure Atria's stuff, and even related to the Melting Ice tech availible for rapid development for Eiffel.
hope this helps.
Johan
johan@ccs.neu.edu -- I'll log in when I get the johan uid. gimmie!
Freemware project (Score:1)
[The Freemware project started directly after VMWare was announced. It's an effort to create an open source (and possibly portable) VMWare-clone.]
How about "virtualizing" on different processors? (Score:1)
This would require you to "hide" the other processor from the OS as far as SMP support goes (or to use an OS that doesn't do SMP), and to make it use "device drivers" that use inter-processor communication to funnel the actual device I/O through the hosting OS.
There would be other issues, like arranging for the hosted processor to see a BIOS that doesn't try to access devices directly and guaranteeing that the hosted processor doesn't go playing with hardware directly. Unfortunately, the solutions to those problems might leave you back at the original problem.
Re: Good question!; incomplete VM in X86? (Score:2)
My current favorite example comes from Linux: The kernel allows user processes to read the current value of the CPU clock counter (using the instruction "rdtsc", or "read time stamp counter"). That instruction can be made to cause a fault by an appropriate flag setting.
I would expect Intel to be fairly good at VM technology after hearing some of the complaints about the '386. (The obvious one is the lack of ring 0 write-protect page faults.)
Funky Walking (Score:2)
I'd dig around in deja-news if I were you.
Re:Ass-talking (Score:2)
(1) you mark all the pages that you want to trap instructions in as non-executable
(2) when code attempts to execute in one of those pages, you get a fault
(3) you trap the fault, and then (and only then) scan the page and modify instructions as necessary
(4) you then mark the page executable and not writable, and let it run
(5) if the page is modified, you then clear the executable bit, because you may have to re-scan it.
Here's how it works. (Score:4)
Okay, imagine that the memory of your computer is like a vast attic, full of flies. Each of the flies is either asleep or awake, and they change state frequently. They live, work, and play in groups of eight, called "bytes". Now, when the computer gets hungry, it opens up its mouth much like a blue whale and sucks in a great big gulp of air from the attic. It filters the flies out of the air with its giant long strandy teeth and gobbles up the flies -- gobble gulp!
So.
The whale has no eyes, and in the whale's tummy there is a man without his greatcoat. That guy is called the "kernel", or "Colonel", and he looks and talks exactly like Colonel Klink on Hogan's Heroes. He has a goofy, bumbling sidekick named Sergeant Shultz, otherwise known as the "Memory Mangement Unit". What Sgt. Shultz does is, well . . . okay. Let's start over. Colonel Klink is in charge of sorting through these flies and putting them together in the right order before the whale (the computer, remember) digests them. This way, the whale won't get a tummyache and feel funny. Col. Klink has to decide which flies to send when, but he needs to have them organized in the right way so he knows which flies are which. If two batches of flies crash into each other, the computer will get very frowny and sad. Col. Klink doesn't like that, because when that happens the General comes and yells at him in German, and Col. Klink doesn't speak German, he just speaks English with a funny accent. So Sgt. Shultz has the very important job of ensuring that the flies don't get mixed up before Col. Klink gets to look at them.
In the arrangement that you're talking about above, things are more complex, because Col. Klink and Sgt. Shultz have to coexist with Col. Hogan and Richard Dawson, who are doing the same thing at the same time. (A little imagination will suffice to guess which OS is which). Hilarity ensues! But everything runs smoothly again at the end of the episode.
Hope this helps.
Ass-talking (Score:4)
The general consensus in comp.arch is that vmware is doing some dynamic recompilation, but is otherwise allowing the hosted operating system to execute natively, and thus use the hardware mmu for the majority of the work.
As has already been mentioned, the IA32 instruction set architecture (ISA) is not completely self-virtualizable, i.e. you can't trap accesses to all cpu state information. But, you can scan through the text of your process and search for those specific opcodes that are not virtualizable. Substitute a call to your own handler for those opcodes and voila! we are now effectively fully virtualizable and the performance hit is minimal, especially if you can save your changes so that you don't have to scan and recompile each page of text more than once. And once you are fully virtualized, as long as you properly trap the right operations and do the right thing, you can let the hardware do 99% of the work for you.
Clearly vmware does more than this with its various virtualized devices, but fundamentally this is probably what is going on.
Re:Ass-talking (Score:1)
s/breakpoint//
Re:Ass-talking (Score:1)
(1) what you replace may be data, not code
(2) even (1) wasn't a problem, how do you know the sequence you found isn't a coincidence? What if the end of "REPNE SCASB" and the beginning of "DIV ECX" just happens to look exacly like "MOV EAX, DR7" ?
--synaptik
Re:Ass-talking (Score:1)
I was hoping you wouldn't notice that small problem.
The idea is sound, it's only that Intel got stingy with the breakpoint registers.
--synaptik
Re:Ass-talking (Score:1)
"MOV EAX, 0CDCDCDCDh" ?
The last four bytes look like 4 "INT 3" instructions.
--synaptik
Re:Instructions I found in vmmon (Score:1)
"fc ff ff ff" is really "0xFFFFFFFC"
or -4.
--synaptik
Re: Good question!; incomplete VM in X86? (Score:1)
In fact, the Intel documents I have specifically state that this register is readable at any priveledge level, but no where have I seen a statement that you can MAKE it a priveledged instruction.
--synaptik
Good question! (Score:2)
Although, I would like to add a rider to his question:
With Intel processors, some hardware registers can't be trapped. For example, any priviledge level can read DR7 to find out if a debugger is resident. Writes to this can obviously be trapped, but AFAIK there is no way to get the processor to trap on reads.
I am sure there are other examples like this, as well. This seems to indicated that it is impossible to virtualize every aspect of the machine.
(Although, I suppose you could put the processor into single-step mode, and look at each instruction before it executes, looking for these types of instructions, but that would slow things WAAAYYYY down.
--synaptik
Re:Ass-talking (Score:2)
I'm no OS developer, but if I were trying to do this, I'd try scanning for these strings, and then placing a hardware execution breakpoint at the beginning of them. If it's not actually code, the breakpoint won't get hit. If it is code, then when it does get hit the VMWare software could just look at the instruction pointer register, to ascertain whether they "hit" in the middle of, or the beginning of, an instruction. If the latter, they simulate that "offending" instruction.
But like he/you said, I'm talking out of my arse.
:)
--synaptik
walla?! (Score:1)
I think you mean "voila." I liked the Colonel Klink explanation better...
Re:Ass-talking: another possibility (Score:1)
I wrote a simple program to scan all the windows dlls and exes for "dangerous" instructions. I found that for most exes and dlls, there were less than 4 instructions per page that would be dangerous. For the remaining ones, you could rewrite the instructions. But then, you have to make the page execute only (not readable or writeable-- is this possible?) and trap any access to it by the processor, to fool it into seeing the original instruction instead of the rewritten one.
Or, you could simply do single step on that page (which might be a viable option since there would be so few of those pages in the average OS-- unless someone specifically wanted to make your VM perform badly
Remember (Score:1)
A few comments show that a program may determine that it's in ring 3 rathar than 0. It's important to remember that an OS has little legitiomate reason to check for that. I wouldn't be surprised if M$ added such checks now that vmware is out, but apparently, they haven't done that in the past.
In general, it's not necessary to perfectly virtualize ring 0 instructions, it just has to be 'good enough'. In practice, determining what 'good enough' actually is can be a tough problem (which is why there aren't dozens of vmware like products out there), but perfection is not required. Most OSes are not hostile to being virtualized, they just assume that they're not being virtualized.
Re:Remember (Score:1)
EMM386.EXE is a kludge added to a kludge. It was done because DOS programs expected to run in real mode, and the '286 didn't virtual86 mode.
Anything that needs to trigger a processor reset to operate is a problem. EMM386 under 'DOS7' probably doesn't work that way anymore.
Brown Simulator plug (Score:1)
No it doesn't (Score:1)
The point of vmware is to provide the fastest possibly emulation of an ia32 machine. So it want to execute all (or nearly all) the instructions directly on the host processor, rather than having to emulate them. The clever bit is to allow it to do this without clobbering the host OS -- this is what requires lots of memory management tricks.
Re:Here's how it works. (Score:4)
Re:I think... (Score:1)
Re:Here is my attempt at an explanation (Score:1)
While we're at it, here's my guess, which is based on badly blurred memories of 80486 documentation.
Memory reads and writes really aren't the difficult part. In protected mode, every process (or task) gets executed in its own 4GB (max) virtual memory space and gets translated by the processor into absolute memory space. The OS swaps out these task spaces to disk while they're not being used. One process should never be able to write to another processes space, which was the whole point of protetcted mode with the i386.
The real issues involve handling interrupts, and executing protected instructions. Take for instance writing directly to hardware through IO ports. The host OS absolutely can't let the hosted OS do what it wants in this area. But the interupt mechanisms of x86 architecture come to the rescue here.
Run the hosted OS in some unpriveledged level (not ring 0) and let the processor interrupt whenever there's a priveledged instruction executed. The host then examines the situation and recovers by implementing the priveledged instruction in an alternative way.
Registers also won't be a problem in most cases since they are saved and restored at a task state switch. Linux shouldn't care what NT does with the registers as long as they get restored when NT gets preempted.
- dw
Re:The dark ages of Software MMUs (Score:1)
Re:Funky Walking (Score:1)
They have patented the object code rewriting, but as far as I know, no one has challanged the patent. Evidently, there were published papers years before Purify reinvented (and patented) this scheme, so at least some of the Purify patents may be uninforcible because of that.
Anyway, this has very little to do with VMWare, or how VMWare is implemented.
Re:Virtual Memory (Score:2)
For example you can mark pages of memory as a not being readable (PROT_NONE flag for the mmap). This will cause a SIGSEGV if the program tries to read/write that address.
Another idea I just thought of as I was writing this post... you could use a kernel module to create a
The dark ages of Software MMUs (Score:1)
Re:soft MMU? (Score:1)
OK, well since it's the day of whacky metephors, I'll try and tackle this one...
A normal linux application has to obey posix rules to interface with the outside world (memory, disk, printer, etc..). Lets take memory as the normal example. An *application* has to ask nicely for whatever resources it wants. It dosn't know much of anything about the *real* state of the machine. Like a cow in it's pen, the process has no idea what the other cows are up to or even how many other cows there are or how big the ranch is. Operating systems attempt to "dial directly" to the hadware, and this is the part that VMWare must emulate. No easy job either, considering all the whacky things you can do on a PC. So if I'm an OS (or one of those old boot-me game disks like flight sim 2.0), I don't even worry about allocating memory - I just start eating it by the bucketfull. BIOS loads the kernel into memory starting at 0x00, then "jmp 0x00" (goto 0x00). Kernel executes, checks how much memory it has, and starts parcelling it out to other applications - it rules the ranch. It dosn't *ask* for more memory, it just "walks the fence" to figure out how much there is.
Clearly to make the ranch-owner behave as a simple cow, while letting him run his own little rat-ranch and never letting him have a clue that he's just a cow in a pen is a pretty neat trick.
Most of the "neat trick" is done for you by the CPU, however as some of the more advanced hackers have pointed out, there are a few weak spots in this virtualized environment. So you still have some crazy stuff to do before you can fully fool the rancher. He's always asking if there's a larger wourld out there, and you have to keep him in the dark at all costs (or he'll die of surprise and fright). Like flatland...
Basically, this is done by brain-washing the rancher into never poking his head through that hole in the wall - which we know leads to the *real* outside world. Or what we know as the outside world - but is it really? Maybee we've already been brainwashed ourselves!!
Application = cow
OS = rancher
computer = ranch
vmware = sophisticated brainwashing for ranchers which makes them think rats are cows and keeps them from looking over the wall of the stall. Also makes them live on hay instead of beef.
-=Julian=-
Re:Speculation: Virtualize the x86 (Score:2)
Re:Virtual Memory (Score:1)
implemented entirely in hardware so this won't
work.
Watch MS (Score:1)
how VMWare works by watching what MS
changes in their next OS release in
order to break it, if they can...
Virtual Memory (Score:5)
In a kernel, this is done (usually) using a mix of hardware and software. If a program tries to access a piece of memory, the hardware looks at the Transition Lookaside Buffer (TLB) to translate the address. If the address exists in the buffer, it does the transition and all is good. If it does not exist, a trap is called to the kernel. It is the kernel's responsibility to look at the virtual memory tables, allocate the memory, copy it if it was copy on write, and most importantly update the TLB so next time it does not have to set up the translation.
So in VM case, this is sorta conjecture. The VM can allocate a slew of memory on the host OS. (As far as the client OS is concerned, this is physical RAM. Then it can make a TLB and all memory accesses will go through it first. This way it can stop Windows from pissing all over OS/2 running on the VM. But Linux will stop the VM from pissing on anything else on the host OS.
As far as kernel traps, the user level program's data needs to be copied over to kernel space for the kernel to access it.
I hope this begins to answer your question.
Re:Here's how it works. (Score:1)
Re:Virtual Memory (Score:1)
If you want to map a memory region to a file, check out the manpage for mmap(2).
Re:Speculation: Virtualize the x86 (Score:1)
debug
would run the program using an execdbg (like execle but slightly modified) call and run the program *htrough* the debugger. At the heart of it was an extra couple of lines in the interupt handler that caught an interrupt either after every instruction was executed *OR* (and this is the bit that's important here) caught an interrupt we'd placed.
What we'd done was implement a feature which given an address woudl copy that instruction out, store it somewhere and then replace it with an interrupt. When that interrupt was trapped the progarm counter would be wound back one and the interrupt replaced by the correct instruction. Execution would be resumed on some input from the user.
It wouldn't be very hard for the VM, on loading the progarm, to scan for 'dodgy' instruction.s These could be replaced by an interrupt and pushed onto the end of a linked-list. Should this interrupt be trapped then the program counter would be wound back one and the interrup replaced by the head of the linked list.
Re:Speculation: Virtualize the x86 (Score:1)
From the Minix src code (only because I've got it to hand, it'd be the same for Linux)
src/kernel/const.h
(line 9) for INTEL chips
#define TRACEBIT 0x100
/* OR this with psw in proc[] for tracing */
(line 109) for M68000
#define TRACEBIT 0x8000
and from my src code for a minix debugger
child_proc->p_reg.psw |= TRACEBIT;
... and later
child_proc->p_reg.psw &= ~TRACEBIT;
soft MMU? (Score:1)
wouldn't linux be able o give it a fixed ampount of RAM as an application and then let VMWare tell linux what to do with it? (er.. i guess im asking why is it different from any normal application? dont they all have protected memory?)
arghh.. my head hurts now..
Re:soft MMU? (Score:1)
but dont all programs get told this by the OS? I thought protected memory was supposed to segment stuff from each other, to keep them all from knowing who all is out there. And what i have read from the documentation of VMware, it doesnt emulate the CPU (which would be slow) but lets it talk to the CPU (er??? doesn't it???)
hehehe... im having flashbacks to the matrix... maybe life was designed by VMware. With liuck my brain is running under linux, not nt.. =)
on a side note, what happens if you let VMware boot your linux drive under linux? or what happens if you have linux automaticaly boot NT under VMware and then have NT boot linu, and so on.. recursive OS! =)
Re:Speculation: Virtualize the x86 (Score:1)
-Mat
technical references (Score:5)
Robert