Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Graphics Software

Hardware To Archive/Manage Large Collection Of Images? 20

HarpoX asks: "Technology is quickly allowing digital cameras to produce images as good as conventional film while cutting time and costs. The archiving of negatives has long been accomplished but the buildup of digital images is providing new problems to be solved. With the potential of accumulating a couple hundred gigabytes of images how can one most efficiently deal with the archiving, storage, retrieval, and management of these assets? Tape drives offer good storage capabilities (20/40 GB DAT, 40/80 GB DLT) but seem to leave the management aspect of these files to be burdensome or impossible. CD-ROM's offer a versatile usage and management while being very cheap but are so small in storage space (640 MB) that it doesn't seem worth the time. Networked hard drive space would seem to offer the most management possibility but would its permanence be not as reliable as a static media? Is there some combination of several media that could work together and maximize productivity?"
This discussion has been archived. No new comments can be posted.

Hardware to Archive/Manage Large Collection of Images?

Comments Filter:
  • Adam, You said:
    The main reason to use near-line and offline were that they were less expensive than online. This is no longer the case unless you plan an online SCSI RAID.

    This is only partly true - If you have enough data, nearline/offline still pays.

    As for managing what is in the Online, Nearline, and Offline sections of your store, there are a bunch of companies that have data storage solutions out there. Here at work we're studying this right now, as we have a need to store 6-7 thousand 4k images a DAY, along with 40 some odd hours of MPEG2s a week - this being every week. Data storage gets up there fast. (Mind you, we have well over 1.5 million hours of video stored on tape)
  • BLOBs don't affect the 8k row size. At least they don't on SQL server. That why we have a text field type in addition to a varchar field type. varchars must fit in the 8k limit, but text fields don't have to. The trade off is that you can't refer to text fields in the join or where or order by clauses of a SQL statement.
  • Why would a database be trying to keep the blobs in memory? Since blobs are already handled differently from every other type of data, why not just leave them on the disk while we're at it.

    Also, if part of the client to the database you wrote an NFS server, then the path on the server could be the record idea, so imageserver.mynetwork.com/tblpictures/111113.jpg would be taken to mean to look in the tblpictures table for the record with 111113 in the key field, and return the blob field. I've never tried this with images, but I've seen it done with other file formats. Of course, this makes the assumption that we are talking about either a unix client, or a rare case where the NT machine has an NFS client on it.

    If the file system is corrupted, you can kiss the data good by too sometimes. I'd think that it wouldn't be that difficult to make a databaase that can stand some corruption without losing everything. I know that access files can take some corruption and still be saved, so I always assumed that the same was true of the big boys like Oracle and Postgres.

    Of course, I would think that an Object oriented database would be more appropriate here than a relational one. After all, for pictures, we are probably only talking about one table in a relational database (unless the keywords are stored in a seperate table), and an Object Oriented database might allow for more flexibility.
  • Is an option I would look at; DVD-RW jukebox. Tracer Technologies has the software, Panasonic the hardware. Software enables stuff like disk spanning etc.

    Check these:

    There might be other companies offering a similar product, but this is what I remember from an earlier discussion here on Slashdot.

    Cya,
    bBob

    --

  • I've heard database gurus *strongly* advise against this, but I've never heard a good explanation why it's such a bad idea. From what I've heard, the reason that fixing the PgSQL 8k row size limit hasn't been a priority is that very few people should be storing more than 8k of data in every row of the database.
  • The film scanner may set you back "only" that much, but how much do you spend on processing and films?
  • Why not do something like a database (pick your own flavour) with fields indicating the time, source, artist, subject, resolution, quality, and path on your partition (a special partition, just for the images, perhaps). Have the database be web-accessible (perhaps, depends on the application), and have the partition backed up like any other data.

    That way you can search through the database for what you're looking for, configure the search to return the various paths, mount the drive (if it's shared via a network), or click (if the database is web-accessible) to see the various images (or just have the database return a thumbnail version), but since the backups are proceeding like system backups, you know your data is reasonably safe.

    I just setup filesharing at home, and it's pretty quick from what I've seen. It's a 100baseT back end, and a Celeron 533 fileserver running Debian Potato, with Samba and Netatalk. Works like a dream so far.

    HTH.


    This is my .sig. It isn't very big.
  • You mention tape as an alternative.

    Actually, I would think that a removable hard drive scheme would be better.

    You may need some custom software to allow you to easily mount / unmount / keep track of the photos on a disk, but a hot swap SCSI (or IDE - yuck :) bay and some relatively high capacity drives would probably fit the bill. The hassles with tape (sequential access, slow transfer speed, relatively flimsy media, expensive drives, etc) are probably not worth the reduction in cost. With the cost of hard drives plummeting and capacities skyrocketing, you'll be able to get similar capacities to tapes, with much faster and easier access.

    Of course, the alternative, for non-portable storage, would be to get a big rack enclosure, and just add hard drives as necessary to accomodate new pictures.

  • Why not take the next step forward and shove all the images right into the database as BLOBs ? That way you don't have to keep track of your filenames and paths and whatnot, and your database won't be rendered immediately useless should some idiot accidentally move the folder holding your images. I guess this is a stretch on the concept of "referential integrity". You just need to make sure the database doesn't get corrupted, else you're duly screwed.
  • As much as I hate to recommend anything by Microsoft here at Slashdot, check out their Terraserver site at http://terraserver.microsoft.com [microsoft.com]. They explain the technical story behind how they store their images on their SQL Server database. I was a bit skeptical that you can do anything like that with a Microsoft product, but it seems to run well -- and they boast about their low downtime, too.
  • Ugh, I hope you can read the real posts inbetween all this stupid spam here. But wanyways...

    What's soo bad about CDR's? Only becasue they're 650 meg in storage? I could see that being one of the only factors. You can pick up a spindle of 50 Verbatiums here for only about $30 (US)... So that'd probably be one of the economical solutions. I could also see space being a major problem, but I'd reccomend getting some kind of CD Library holder... That's how I'd solve the problem. But if your into big Networking ideas, your out of luck looking twoards me. I like the cdr idea :-)

  • I think this, conceptually, is how I would do it, given the time and money. Given even more money, I would do two things differently. First, and I know you considered this (and I like the reason you didn't do it), I would put the images right into the database. If only to prevent the possibility that the database would occasionally point to a file or location that was no longer there. Also, importantly, it would simplify security settings (you would only need database security, not additional, analagous file security). Also, I don't like the idea of archiving to CD-rom, ever, if it can be avoided. The problem is the "anyone can look them up and get the images when needed" part. What about security? I.e., where do you store the CDs, and how does just anyone get there? Aren't they locked up? Aren't there some images that are available only to some people, not others? Also, what about forgetfulness? How do you get people to return the CD's when they're done? If you instead demand that people make copies without checking out the CDs, that's another kind of headache (you need to set up some facility to make that possible). Have you dealt with these issues? I'd be curious to hear whether they've really come up much. Thanks, Matt Morgan
  • There is software that does this on the DB side. The company that sells it could probably suggest a good way to store it as well.
    We have looked at doing a similar thing because we produce 200 GB of images and movies every few months. Our web group and marketing group wants to get at this data later but doesn't know exatly what they want till they see it.
    We are still looking at different solutions for this. Some suggestions have been SAN's (server area networks?) or other such things. These support terabytes of data and as long as the filesystem is in an order that makes sense, jumping to projects can be automated with shell scripts.

    For databases, I know that Informix [informix.com] has Media360 [informix.com]. I think that Oracle might have the same. I also know that there are 3rd party apps out there, but I can't think of the name.
  • Look at every other question of data storage. There are simple and easy answers, storing pictures is no different from storing documents.

    In your case, it appears you want:
    Instant access
    Lots of space
    Cheap storage

    In a typical situation, you use large HD for instant access, in a raid mirror for data integrity and safety. You also get a large capacity WORM drive for backups. Tape drives arereally your best choice.

    In any storage situation, you face four questions:
    How much storage do you need online
    How much storage do you need nearline
    How much storage do you need offline
    How important is the data

    Online storage would be a hard drive. You access this data weekly. (Fast, Big)
    Nearline storage would be a tape changer, reel to reel, etc. This isn't used widely now as online storage is so inexpensive. You access this data several times a year. (Slow, Big)
    Offline storage would be data stored on a media which is handled by a person. Change the tapes, put a CD/DVD-RAM, etc in. This is used primarily for backup and archive. You rarely access this data. (Slow, Labor required, Big)

    The main reason to use near-line and offline were that they were less expensive than online. This is no longer the case unless you plan an online SCSI RAID.

    The final question is backup. As you indicate, CD is too small. You can get tape drives that manage 40GB per tape, this would probably be your best bet. Remember to keep a complete backup off-site (fire, disaster, etc).

    For your situation, I would again recommend a huge online RAID (40GB IDE drives in a mirror and striped configuration), and an automatic tape changer backup.

    -Adam

    Plaid ribbon campaign against code commenting:
    If it was hard to write, it should be hard to understand.
  • If you build your database correctly you shouldn't have a problem with blobs. They won't effect searching or caching since you queries will be against the non-blob fields. The really nice thing is that you can use the database backup utilities to save all of your data.

    You would have to build a front end to fetch and insert pictures but that shouldn't be too hard and you'll probably want that anyway to support stuff like version archiving, etc.

    One plus of everything in the database is that you can fetch the data across the network with having to share a drive via samba or NFS.

    With the better databases you can even share the data via replication.

    The downsides are that you would basically have to become a database administrator to take care of the system and stuff like that.

  • use fibrechannel.

    It's _really_ _really_ expandable (1 pci adapter -->8*16 devices, configuration is realatively easy as well (unlike scuzzy)...

    and performance is quite nice. Stuff is more expensive (notably, the bus adapter and cables) but the disks are OK -- I've seen seagate 9.1G hd/~100 (7200 RPM).


    willis/

    disclaimer: I work for a fibre channel company

  • Once again, I'm forced to bring up Firewire - no more device IDs, reasonable transfer rate, and hot-plug (to dig through that stack of 75GB IDE drives!). This is one reason why I've been looking into 1394 on non-Windows OSes - I'm amassing quite the useless collection of digital media, and I'm trying to figure out how I'm going to grow. Firewire seems like a good bet.

    Otherwise, I'd say DVD-ROM, since it would have true archive-type storage (read-only), and it's bigger than CD-ROM.

  • BLOBS are generally considered a bad thing by DB gurus. There are several reasons for this:

    • Adding several gigabytes of binary data to a db file makes it a hell of a lot slower to search, and prevents the DB from keeping the whole database in memory
    • An image stored in a database can't be easily opened by photoshop and the like, wheras a file on a conventional disk (hd, cdrom, tape, whatever) is easy to open
    • If your database is corrupted you can kiss all that data goodbye (as you say).
    • Why would anyone leave several gigabytes of archived photos open for an idiot to move around?

    Of course, for a home photo album linked to a website you can probably get away with Blobs. For the scale the question seems to be asking about (some kind of professional publication/agency) I don't think they'd be suitable.

    .02k

  • by msuzio ( 3104 ) on Friday August 04, 2000 @08:48AM (#880420) Homepage
    Oh boy.

    I'm about to embark on a multi-six-figure project to *fix* a bad image archiving system for a major vendor. What they did was this:

    - Content is manufactured via scanning of microfilm to TIFF images
    - The TIFFs are put through a QA process. Bad scans are re-done. All of this is done by *people* (the scanning too).
    - Once out of QA, the images are burned to CD

    - The CD storage is via several jukeboxes of CD reader/writers (2 read heads, 1 RW head). Each jukebox holds 200 CDs, I think. These are combined into CD towers with multiple jukeboxes in them.

    - Access to the images is then managed by a piece of C++ code that knows how to drive the CD jukebox (it's huge, and not like a normal CD drive) and get data off of it.

    This turns out to completely suck. Getting images off the drives is slow and buggy. The media chosen was far too cheap, for one thing. Springing for the good stuff, however, suddenly makes CDs much less attractive in terms of cost. We often see read errors, stuck drives, etc.

    My company contracted to put this data out onto the Web. This turned out to be a nightmare. With unreliable and very slow hardware (remember, the CDs have to be moved into a reader from their position within the jukebox!), real-time delivery is just impossible.

    By contrast, spinning disk is constantly decreasing in price. Sure, you pay a lot of money for an EMC unit, but it offers much more in terms of flexibility and expansion.

    My client took the wrong road. Now they are faced with several months of I/O to get the CD data (1600 CDs worth) off the buggy drives and onto an EMC unit. Then they have to pay us to update our Web server software.

    CDs are great for archives. They are very very bad for any sort of access to the data. I wouldn't recommend a CD solution. Another point to mention is no one seems to know what the shelf life of CD-Rs is... be a pity to have those turn into coasters in 5-20 years.
  • by Mark Schoonover ( 97516 ) on Friday August 04, 2000 @06:56AM (#880421) Homepage
    HarpoX -

    You've actually touched on a few seperate issues with managing digital images electronically...

    I've spent the last few months developing such a system for my company. My company shoots on average between 450 and 500 rolls of 36 exposure film every month. Most of the projects can take anywhere from 6 months to several years, and those images need to be available until the project closes.

    I looked at using digital cameras instead of standard film cameras. I ran into two issues - the first is file size. In some cases we need to blow the photo up to make presentations. In order to get enough detail to make an E sized blowup of the photo, you're looking at- as a minimum - of an 18MB sized image. Figure 36 exposures per roll, 475 rolls of film, etc and you'll be in the terabyte range for storage requirements. Also, the cameras are expensive and our employees tend to drop them, get run over, etc.

    Yes, this means we scan in negatives! This really isn't as big a deal as it may seem. Some of the better film scanners can handle bulk scanning of negatives. Even a top of the line Kodak roll scanner will only set you back about two high end digital cameras.

    The film system is developed using MySQL, Perl and Apache running on Slack.... Realizing that I could run into several hundred thousands of images that need to be available online, I designed the system so that it could be split up. I didn't store the images within the database, but instead they are simply stored in the filesystem. Right now the entire system runs on one server, but it could easily be split out to a seperate web, database and image server if performance suffers.

    As for archiving images, we only do it when a case closes. All indexing information within the database gets moved to an archived database and the images are moved off to CDROM and cataloged. That way, anyone can look them up and go get the images when needed. This is mainly a company procedural issue, your requirements probably differ. I wouldn't be comfortable with long term storage of information on tape, especially if it's very long term - like forever. CDROM for us was the easiest solution since it's also possible to get CD writers that a jukebox like in nature. They can make a multi CD set rather easy, simplifying long term storage.

    Cheers! Mark

Intel CPUs are not defective, they just act that way. -- Henry Spencer

Working...