Making Use of Terabytes of Unused Storage 448
kernspaltung writes "I manage a network of roughly a hundred Windows boxes, all of them with hard drives of at least 40GB — many have 80GB drives and larger. Other than what's used by the OS, a few applications, and a smattering of small documents, this space is idle. What would be a productive use for these terabytes of wasted space? Does any software exist that would enable pooling this extra space into one or more large virtual networked drives? Something that could offer the fault-tolerance and ease-of-use of ZFS across a network of PCs would be great for small-to-medium organizations."
Porn (Score:5, Funny)
Re: (Score:2)
Re:Typical IT guy (Score:5, Informative)
Re: (Score:3, Funny)
vista? (Score:5, Funny)
Re:vista? - DFS (Score:4, Informative)
Re:vista? - DFS (Score:5, Interesting)
This is why SAN manufacturers have come up with "thin provisioning". NetApp is quite good it, read more here [netapp.com].
Re: (Score:3, Informative)
Re: (Score:2)
Re:vista? - DFS (Score:5, Informative)
Running DFS (to serve files) on Windows XP clients? What are you smoking?
From Microsoft TechNet:
The servers that will participate in DFS Replication must run Windows Server 2003 R2.
It is possible to use DFS Namespaces when domain controllers and namespace servers run a mix of Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 without SP1, and Windows 2000 Server, but some functionality is disabled or available inconsistently, depending on the operating systems on the servers.
From: http://technet2.microsoft.com/WindowsServer/en/library/1aa249c0-40f3-4974-b67f-e650b602415e1033.mspx?mfr=true [microsoft.com]
Re: (Score:3, Informative)
So you can count DFS as a big NOGO.
JFGI (Score:2, Funny)
Re:It's been done by Microsoft: DFS NameSpaces (Score:5, Funny)
Re: (Score:3, Insightful)
been done already? (Score:3, Informative)
And, if you're claiming some kind of market race, you might want to check for relevant dates concerning ZFS [wikipedia.org]
Of course, if you're just trolling, ignore me.
Re:vista? (Score:5, Funny)
http://www.uniquepeek.com/viewpage.php?page_id=1517 [uniquepeek.com]
easy! (Score:5, Funny)
Absolutely! Just hook them up directly to the internet before you update the machines, wait a few minutes, and voila! They'll be filled up with extra files in no time! Hey, you didn't say anything about wanting to be in control of what gets put on the machines...
Not without heavy utilization of other resources (Score:5, Insightful)
Re:Not without heavy *use* of other resources (Score:2, Insightful)
Please stop typing words like "utilization" when you mean "use". You sound like a PHB trying to sound smarter than he really is and you make it a pain for people to read what you write, especially non-Anglophones. Read George Orwell's essay on this topic [mtholyoke.edu].
Re: (Score:3, Insightful)
As is my understanding, resources are utilised, while tools are used. He was correct in its usage.
Re: (Score:2)
Grog likes it simple (Score:2, Insightful)
Re: (Score:3, Funny)
Re:Not without heavy *use* of other resources (Score:5, Funny)
Re:Not without heavy *use* of other resources (Score:5, Funny)
The solution is obvious. We need to think outside the box and raise the bar when it comes to language... someone needs to step up to the plate and bring something new to the table. I'm thinking of someone I have synergy with, not just the type that goes for the low-hanging fruit.
Ooh.... he's spinning nicely. Another couple of Orwells and we'll have enough electricity to power the world
Re:Not without heavy *use* of other resources (Score:5, Funny)
Re: (Score:2)
The double irony of making the central joke of your post the tired old "Orwell spinning in his grave" is delicious.
Re: (Score:2)
Re: (Score:2)
Sesquipedalian verbalization (Score:4, Informative)
Re: (Score:2, Offtopic)
It's a Dilbert reference. http://en.wikipedia.org/wiki/Dilbert [wikipedia.org]
"PHB" is the short form of "Pointy Haired Boss".
Re: (Score:2)
Please don't (Score:5, Interesting)
The reason is the obscenely large amount of power required to use the space given a few gigabytes requires the whole machine to be running, and uses it's CPU which can't be less than 21Watts itself.
It's actually cheaper to get a 1TB drive and use it elsewhere than use the power on so many desktops (or worse, servers). Even with the desktops in use by active users.
Re: (Score:2)
You could secure them with passwords and so on.
Oh go ahead and poke flaming holes in my suggestion *buries face in hands and sobs*
Re: (Score:2)
Re: (Score:2)
Of course the risk of backing up your data on the same physical drive remains. I suppose a VM booting, a secure copy to
Do you really have control of the boxes? (Score:5, Insightful)
--Marc
Re: (Score:2)
Re: (Score:2)
Well, filling up is kinda the point of the entire exercise, but you're right - being shut off, crashing, or being otherwise disconnected is enough of a problem to make this a non-starter. We're basically talking about a distributed filesystem in which subparts may fail without notice. I'm sure there are ways to minimise the problems this will create - you can for instance make sure that any one file is always completely locate
Re: (Score:2, Insightful)
a) Is there there something productive I can be doing?
b) How to do it?
Everything else is fluff that tends to lead slashdot readers off on tangents, flamewars, Emacs Vs Vi (emacs), KDE vs GNOME (gnome)
Re: (Score:2)
Let's say 50GB, on average, is free on each computer. We want some fairly hefty redundancy, so let's knock it down to 10GB of storage per machine.
Spanned across 1000 work machines - that's 10 Terabytes of storage, with 4X or so
Re: (Score:2)
Now, this would only work well for lossy compression like, say... images, but I'm sure that is the intention of the submitter. Incidentally, it may provide an incentive for the workers to keep their computers turned on, and even stay late after work.
The more, the merrier. Literally.
Re: (Score:2)
Download and mirror the Internet... (Score:5, Funny)
Re: (Score:2)
not enough info (Score:2, Interesting)
Re:not enough info (Score:5, Funny)
Re: (Score:2)
This'll fill a small room. But an entire basement? Not really.
The REAL problem becomes supplying power for all of them without constantly blowing a breaker, setting the house on fire (through electrical or thermal means), or cooking the systems by producing too much heat.
Not that I've ever tried such a thing...no...not me...
*Whistle*
Re: (Score:2)
Maybe move with the times? (Score:3, Insightful)
Is it cost effective to reclaim that (small) space? Probably not. My suggestion is to realize that no-one tries to save clock cycles any more and maybe this is the way disk storage is probably heading that way.
Space is not that important any longer (Score:5, Insightful)
Let's assume that the average computer has 80 GB of storage. Multiply that by 100 and you get 8 TB of space. That's what you can get into one or two computers nowadays without plunging out too much cash.
What's more interesting is how much processing power you have as well as how fast the internet connection is.
Re:Space is not that important any longer (Score:5, Insightful)
I disagree with this and face this question all the time in work. Disks are cheap, storage systems aren't. If this is for a business that requires reasonable uptime, then the only solution would be to implement a SAN using Fibre Channel or iSCSI and then take out the drives. With the right array, all of a sudden those drives become superfluous (you decide if boot from SAN is right for you), management is easier and you'll be able to get a lot of reuse out of the drives.
Now a lot of people will start to question the cost of doing all of this and it isn't cheap, however you have to analyze the data correctly. We migrated 200 servers from DAS to a SAN and had our money back within 12 months. Add on top of that the implementation of VMs, all of a sudden those 200 went to 20. That's a big difference in cost of ownership.
Re: (Score:2)
Disks are cheap, storage systems aren't.
No, SOME storage systems are expensive.
You can pretty easily put together an inexpensive SATA array with multiple terrabytes of storage. To anyone that thinks Fibre Channel or iSCSI is just a million times better or more reliable than SATA, I'd say you're being sold a bill of goods by your vendor. Unless you have very high performance needs like say a database being hit by thousands of people, SATA will serve you just fine.
Re: (Score:3, Insightful)
SATA is a drive interface spec. NAS is a generic description of a type of storage device. iSCSI is a communication protocol, as is GigE.
It's being used as storage for an Oracle database server used by around a hundred simultaneous users.
By buying commodity parts from Fry's I managed to get 3T usable for under $2000.
Oh, and I had fun building it.
Re: (Score:3, Insightful)
Re:Space is not that important any longer (Score:5, Insightful)
If they are really only using it for the OS, a few applications, and a few docs why not use diskless workstations?
Less power, heat, and fewer things to break.
In other words don't use all those drives, get ride of all of them.
GlusterFS (Score:3, Informative)
You definitely can't run Windows in order to utilize this, but it should be a minimal effort to setup a quick netboot lab to test it with.
Cheers.
Re: (Score:2)
One could envision setting up small VMWare Player instances running under a different account on Windows launch using "Scheduled Tasks" for that account (set to launch on reboot). Or - run VMWare Player as a service. A little beefier would be VMWare Server (free), but a bit more of a hassle (need to also install IIS on eac
Send them to our troops in Iraq (Score:3, Funny)
Re:Send them to our troops in Iraq (Score:5, Informative)
I've had a chance to read after-action reports from Iraq and Afghanistan, and the 9mm is pretty much a joke. Most of the forces that really rely on hangun stopping power have obtained emergency authorization to bypass normal procurement processes in order to get better handguns using better ammunition. To my knowledge, a modern
Re: (Score:3, Informative)
Remember, pistol rounds are pistol rounds, and rifle rounds are rifle rounds.
Next time he should test it with pretty much any centerfire rifle.
9mm vs .45 (Score:3, Interesting)
Despite all this, I think that when it comes down to the army, it's mostly because of ammunition selection. Troops are issued non-expanding FMJ ammunition, which leads to 9mm over penetrating and under performing. The 1911, chambered i
Re: (Score:3, Funny)
Now, that doesn't mean that a .45 doesn't have more stopping power than 9mm, just that it wouldn't penetrate the aluminum casing of a hard drive. Fortu
Re: (Score:3, Insightful)
When the time comes that I need
Sanmelody (Score:4, Informative)
AFS (Score:5, Informative)
Re: (Score:2)
Re: (Score:2, Insightful)
Solution for Linux (Score:2, Informative)
If there's nothing similar for windows, you might be able to run it through cygwin.
Actually, this claims to run on Windows: http://www.vanheusden.com/Loose/nbdsrvr/ [vanheusden.com]
Re: (Score:2)
nbd is nice for some stuff but lacks fault-tolerance. Of course, you can run RAID, possibly several levels (say, a raid-6 on top of raid-1 or something) on top of nbd devices to trade space for fault-tolerance as much as you want, but you still lack flexibility. The advantage to RAID-over-nbd, on the other hand, is of course that you can do that right now if you want :] (And yes, the nbd server shouldn't be overly hard to run on Windows, one would think; it's rather simple...)
A better solution would work o
You read my mind! (Score:2)
Our whole organization is about a 1000 Windoze desktops, but I'd like to try it in our local workgroup first (maybe 20 systems). I looked around but couldn't find anything
Re: (Score:2)
You had me up until there. Um, what happens when YOU lose one of the parts?
Storage (Score:2, Informative)
I was suggesting to run DrFTPD [drftpd.org] as a backend with NetDrive [american.edu] as an access medium. It looks good on paper, but I've never had the chance to apply it so widescale
With DrFTPD it's easy to setup whatever kind of redundancy you would want, ie: "at least 3 nodes will mirror all files in
the IT guy with time on his hands (Score:2)
The first question to ask is whether what you want to do makes any sense for your employer. Who has to maintain this beast once you build it.
dCache (Score:3, Interesting)
I have a similar problem (Score:2)
They dont really need lots of space (maybe 30Gbyte for OS and temp-files), otoh without redundancy the other 450Gbyte are worthless.
As the task is emberassingly parallel, Network traffic wouldnt be a problem.
If there was a solution to compine all this storage (doesnt even have to be transparent) into a distributed, redundant storage network, i could surely make use of those Tbytes
Re: (Score:2)
What i am imagine doesnt need to be low-level.
Just a userland-application with container-files would be fine:
They can listen to each other, and each file gets replicated on every node. If the filling level gets higher, copies are purged up to a minimal redundancy level.
Even the factor 2 loss of non-parity redundancy would still be a lot better than not using th espace at all.
Backup (Score:2, Informative)
Looking at the problem another way... (Score:5, Insightful)
You might want to ask yourself why, after more than a decade of research and countless papers and prototypes that address this problem, your PCs storage are still underutilized...
It's harder than it looks to get something reliable. Your PCs have extra capacity because it's cheap, but mining that capacity is not cheap. As other posters have pointed out, putting together (or just purchasing) a server with a few TB of storage is simpler and cheaper, less prone to getting wiped out by a virus, easier to manage and backup.
I'm not sure that's a good idea... (Score:2, Interesting)
Storage at Desk (Score:2, Informative)
My first (serious) thought... (Score:2)
Was to use a software driver to export the spare part of the disk as an iSCSI (or iATA, if you prefer) target. For performance and integrity, you'd probably be better having a dedicated partition the OS couldn't easily fiddle with, but it shouldn't be too hard to create an array of ~50GB iSCSI targets that you could then collate into larger volumes. Performance wouldn't be stellar, unless you could use a dedicated NIC/VLAN on the hosts, but should be reasonable enough for use a nearline storage of non-cri
A project for Google? (Score:2)
I'd like to see some sort of distributed filesystem as a standard installation option in a linux distribution... The question would be something to the effect of "would you like your computer to fi
Re:A project for Google? - whoops here it is (Score:2)
http://labs.google.com/papers/gfs.html [google.com]
The big questions of course are is it usable by regular people, and is anyone actually working on implementing and including this in any of the major operating systems?
Re: (Score:2)
Microsoft Farsite (and related topics) (Score:2)
The short version of the problem is that the level of service you can expect from each system is incredibly variable, so it's hard to offer a meaningful QoS for the system as a whole. It's not quite as bad as the distributed-hash-table problem (a.k.a. P2P file storage), but it's sti
Replace the drives? (Score:2)
I know the cheapness of drives may make this silly.
Birth of the Matrix? (Score:5, Interesting)
What would be a productive use for these terabytes of wasted space?
Well, I had this idea when I read about some Open Source software that allowed distributed storage (sorry, forgot what that was, but by now I am sure it has already been mentioned in this discussion). The idea was this - suppose we have such software for unlimited distributed storage, so that people can download it and volunteer some unused space on their HD for a storage pool. Then suppose we have some software for distributed computing like we have for the SETI program. Now we have ziggabytes of storage and googleplexflops of processing power, what can we do with that? How about, for one thing, storing the entire internet (using compression, of course) on that endless distributed storage, and then running a decentralized, independent internet via P2P software? The distributed database could be constantly updated from the original sources, and the distributed storage then becomes in effect a giant cache that contains the entire internet. Now we could employ the distributed computing software to datamine that cache and we could have searching independent of Google or Yahoo or M$FT. Beyond that we could develop some AI that uses all that computing power and all that data to do... what? - I'm not sure yet. Just thought I would throw this out there to perhaps maybe get stepped on, or who knows, inspire further thought.
Re: (Score:2)
Two products you should probably take a look at (Score:2)
http://www.seanodes.com/ [seanodes.com]
http://www.revstor.com/ [revstor.com]
Both claim to be able to pool unused storage on desktops and application servers and make it available to hosts on the network.
The better question: (Score:2)
The answer, of course, is that there are a lot of business applications that
Do nothing. (Score:2)
Unless you like re-building machines with dead disks it's just not worth it.
There's "full" and then there's "full" (Score:2)
And then there's "full" as in "Hey, I can cram more crap into that box!"
I need some of that space to defrag the HDDs on my windows box.
Now, if only there was some filesystem whose performance didn't degrade over time due to fragmentation...
The fundamental flaw of this is..... (Score:2)
Not so long ago MS did an upgrade that brought my system to a halt in my usability of the system as it was using all additional drive space to cache my system, on my system.
When it stopped even teh cache didn't show this usage but I was able to determine it was all in the cache and had to learn about clearing teh cache, not by delete but by ctrl-del to r
May I recommend against this? (Score:3, Insightful)
1. You will noticeably reduce the lifespan of the discs. (Which can anger cost conscious supervisors)
2. Doing ongoing hardware maintenance, because of this reduced lifespan, on closed, used by others, boxes is a *serious* pain.
Storage setups make hot swapping discs easy, trying to do this with full blown systems just gets tiresome. The solution I eventually came up with was the following.
Implement a two tiered hardware replacement cycle where you reduce the time a user is allowed to keep any hard drive in their box before replacement. Then using the still reasonably good drives, create a centralized storage solution in which the drives can live out the rest of their useful spans. Data security, user happiness, and redundancy are all good selling points of this system. You still have to deal with monkeying around in user boxes but if it's on a schedule and it nets you more drives, it's not so bad.
-Ian
Allmydata "Tahoe" (Score:3, Informative)
Re: (Score:2)
Re: (Score:3, Funny)
More like your post being a slashvertisement.
Re: (Score:2, Insightful)
Obviously computers will crash or be turned off. We have this wonderful concept in architecture design called "redundancy" which we can use to address problems like that:
Assume the probability of any computer being offline is d(c_n). For some computers you will have d(c) very low, such as user out of town often, other will have d(c) quite high, either the user leaves it on a
Re: (Score:2)