Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Music Media

Automatically Managing Large MP3 Collections? 30

chhamilton asks: "Where I work we have a dedicated music server with an evergrowing music collection (>60 GB). Currently, we manage it completely by hand (myself and a dedicated few others) enforcing a consistent naming scheme, ID3v1 and v2 tags, and sorting. We're wondering if there are any decent tools out there for automating this process as much as possible. The ideal tool would be something that tries to infer artist/album/song name by analyzing the file name and ID3 tags (if present), and cross-checking against the CDDB database, and then 'normalize' the track by renaming, updating the ID3s, moving to an appropriate location, and entering into a MySQL DB. We found a similar question asked previsouly, but without too many helpful responses. Does anybody know of any useful tools out there doing anything remotely like this?"
This discussion has been archived. No new comments can be posted.

Automatically Managing Large MP3 Collections?

Comments Filter:
  • I'm trying to figure out if anyone uses ID3 tags for searching MP3s. So far, it looks as though Napster, Gnutella, Freenet, Mojo Nation, etc. do not look inside the files to check the contents of these tags. Am I missing something?

    I'm curious if there's any incentive for people to actually fill out the tags, especially the more complex ID3v2 frames.
  • ... they're source code is free for all (or was... i stopped using that garbage awhile ago), just use their code as a reference and make your own ;)
  • I've recently been going thru a cleanup of my >20GB collection, it's been tough. But MP3 Internet Renamer [mp3ir.com] has helped. Try the Tucows page [tucows.com] for it if the above link doesn't work. I looked thru several of these apps, this one is the most flexible, bar none.
  • http://freshmeat.net/projects/mp3mover/ [freshmeat.net]

    Adding CDDB xref'ing is an excersise left to the reader ;-).

    M. R.

  • Check out my project Agatha [sourceforge.net] Over at SourceForge.. It's mainly for playing music.. but in generating the web pages of songs it mangles the names to make them all similar. At the moment it's all based on file names but I tested out some id3 code that would pull the song name.. and drop back to file name mangling if it didn't have one. Anyway this is a shameless plug for my first os project.
  • Well, click here [be.com] to check out the BeOS hardware compatability list for X86 platforms. The Symbios 53c8xx SCSI chipset is listed as supported. As for RAID, here is what I found at Be's web site:

    "Does the BeOS support RAID? At this time, the BeOS does not provide any support for RAID in software. If you have a RAID device set up by a software RAID toolkit, it will not be accessible under the BeOS. However, if you happen to have a hardware RAID device which pretends to be a single hard drive, that should work fine. These are rare, though. We do intend to support RAID in a future BeOS release, but we do not have a schedule for that at the moment. "

    So, the answer depends upon your hardware config. For setups that don't use RAID, like the one in my office, you'd be good to go :)

  • by Wakko Warner ( 324 ) on Sunday April 22, 2001 @10:59PM (#273606) Homepage Journal
    Two "identical" songs may produce different hashes depending on the encoder used, bitrate, etc.

    What I did was write a perl script which compares (using String::Similarity and File::MP3Info) the filenames themselves and their running time in seconds to those in my MP3 database. The script is command-line configurable to accept different "similarity" values as well as different "difference in seconds" values. The code itself is insanely simple, and so far it's about 80-90% accurate at finding duplicate MP3s. False-positives are surprisingly low, too. Greatly simplifies the task of weeding out already-downloaded crap.

    - A.P.

    --
    Forget Napster. Why not really break the law?

  • I have developed a similar system - and I'm also in the process of OS'ing it. It has ton's of features - a perl script runs through the mp3 files - generates mySQL tables. There's also an advance HTML interface and tons of other features. I'm currently cleaning up the code - after that, I'll publish it on Freshmeat or something..
  • I have written a web based management tool, written in PHP, to handle big MP3 collections. It current it handles about 100GB & I have run it on Linux & Windows systems.

    The main cataloging script behind it recursive reads directories that u specify in the config file. From the directory structures & filenames it cleverly works out the artist name, song title, track number, album name, etc. It even works on compilation CDs. All this information is inserted into a mySQL database with artists & albums being foreign keys - so u can perform searches.

    There is support for ID v1 tags but I only use this to get the length of the track.

    The web front end has support for users. U log in with a username & password. This enables u to save favourite artists, songs, etc, all of which are accessible by menu bars.

    Album playlists are automatically created, there is a Dynamic HTML drag & drop system to create & order your own custom playlists. U can click on songs to play them, or albums, or your custom playlists.

    There are so many more features - search tool, album cover support, forums, administration area, etc. I guess I could open source it when I get the time.
  • Freeamp makes an explorer-like tree based on the ID3 tags of your MP3 files. It's not the most elegant solution, but it's pretty handy if you have a few gigs and you're looking for something (or you want to change, say, the album name or artist name for EVERY work from a single artist.).
  • Though BeOS has large filesystem capability, will it's reduced driver base allow it to run on systems that have this capacity. Say, for example a Compaq or HP, or IBM Intel server with multi-Gb RAID - will BeOS support the SCSI adapters (i.e. Symbios etc.) that are needed to run the arrays?
  • a lot of people, myself included, listen to music with little or no lyrics though...

  • Try freetantrum [freetantrum.org]; they have a nice library which lets you make acoustic fingerprints of various files (including MP3 and Vorbis) and look them up in their database.

    --
  • by Zurk ( 37028 ) <zurktech AT gmail DOT com> on Sunday April 22, 2001 @05:11PM (#273613) Journal
    its fairly easy. use an MD5 hash to generate keys (or RIPEMD-160) which reference each file and then dump em to mysql. cross reference em with the ID3 tags and/or wget the stuff from cddb and you should be all set...shell script should do it..nothing fancy required.
  • by Anonymous Coward
    This is going to be easy in a few months. You will simply have to submit your song to the new Napster service with Acoustic Fingerprinting [slashdot.org], and it will return something like:

    Sorry, but this song infringes upon the following copyright:
    artist = ...
    album = ...
    song name = ...

    Just grab the info, and there you go.

  • its fairly easy. use an MD5 hash to generate keys (or RIPEMD-160) which reference each file and then dump em to mysql.

    While I don't know what RIPEMD-160 is (Google is my friend in a minute) I'd further the suggestion my keying to the entire file rather than just the filename -- it'll be easier to detect (exact) duplicates but I'm not sure if the overhead of MD5'ing a 5M file is worth it in hindsight.

    (I'd also suggest using whatever DB you have handy and Postgres if you don't have any db currently running, but that is another flame war in and of itself.)

  • gronk [jwz.org] is more of a frontend for playing MP3s but it also has them all organized by artist, using CDDB. you may wish to take a look at how it works.

    personally i have perl scripts that extract ID3 information and store stuff like the CD# i burned it on or directory it is currently in, bitrate, size, filename, all ID3 info in a text file and that can be searched easily. that way it is easy to find songs by searchnig and i don't have to rename them all the time.

  • I'm not sure if the overhead of MD5'ing a 5M file is worth it in hindsight.
    There shouldn't be much noticeable overhead for generating an MD5 on a 5 MB file. I trade (legal) hi-fi concerts as .SHNs (no loss) pretty frequently, and the filesizes frequently run > 100 MB per song. Generating an MD5 for a file this big takes 10-12 seconds. ... so unless you're adding music to your collection like a madman, there shouldn't be detrimental effects of MD5ing the entire track. HTH.
  • One word: BeOS


    I use BeOS to manage my growing collection of MP3s. With BeOS, you do not need to run a separate database to keep track of your files. Be's file system (BFS) actually works like a live database. You can add all sorts of indexable attributes to any type of file. Searching for files in BeOS looks just like a database query. Wrt MP3's, all the info you find in ID3 tags can be stored as file attributes. When you search for songs based on any combination of attributes, you get the results almost instantaneously. The best part is that in most cases, no manual entry is required for anything, including manipulation of filenames and ID3 data!


    There are a number of ways to manage MP3 meta data. If you want to take advantage of CDDB or FreeDB in the ripping process, then Scot Hacker's RipEnc [bebits.com] is the way to go. Pop in a CD, it will get recognized from a CDDB/FreeDB lookup. Then, just tell RipEnc the genre and year of the album. RipEnc will rip the CD for you and add the CDDB, year and genre info to each song's ID3 tags and filesystem attributes. Of course, you'll have complete control over bitrate, frequency, name format, the whole shebang. For convenience, songs get stored under a directory heirarchy like this: ~/mp3/artist/album


    If you have a bunch of MP3s already on your HD and you want to change around the ID3 info and/or standardize on filename structure, then ArmyKnife [bebits.com] will do it all for you. Here is a description of ArmyKnife, shamelessly plagarized from www.bebits.com:


    "The Army Knife for MP3's and OggVorbis The Army Knife is a BeOS application that allows users to perform ID3 Tag to Attribute assignment, Attribute to ID3 Tag assignment, parsing of file names to fill attributes, and renaming files based on their attributes. It also includes an attribute and tag editor that allows the user to work with mulitple files at once."

    When you're ready to serve out your MP3's under BeOS, you'll have at least two methods available. The RobinHood web server has a plugin to let you stream songs over your network. However, the easiest (and possibly most flexible) way of serving out MP3's is to use Stephen van Egmond's Be In Your Stereo. Again, I shall shamelessly plagarize the description from BeBits:


    "Be in your Stereo is a plugin to SoundPlay that scans your BFS volumes for digital music files. It builds a cross-referenced index of your collection based on Artist, Genre, Year, and Album BFS attributes, then serves up views of your track list and collection via HTTP. It is ideal for building a home audio server. In addition to your current play list and cross-referenced views of your entire music collection. It will accept commands to add chunks of your collection to the playlist, and manipulate playback in useful detail - volume, track, track position, etc. With the plugin, you can park a BeOS machine with a modest CPU, quiet fan, networking, audio and storage hardware next to your stereo, and manipulate it from anywhere on your home network. The plugin also has facilities for streaming and downloading files directly to the client, so it can even serve as a crude file server for wider networking setups."

    Finally, Scot Hacker has lots of great information on using BeOS for a headless networked MP3 jukebox. You can read more about it here. [betips.net] Btw, please check out the screenshots of the above programs at the links I provided; they'll do more justice than I can with my descriptions!


  • by Anonymous Coward

    Someone should code this into RIMPS [sourceforge.net]
    which already does an awesome job at managing
    massive amounts of MP3s and Oggs!!

  • Why bother telling him this, if he can`t get a hold of it him selve
  • Have you considered releasing this tool, (which would appear to be considerably useful) in some open source manner so that people like myself and the person who asked this Ask Slashdot could get ahold of it? I know I would *love* to be get my hands on this.
  • I have seen a demo of GroovePort's [grooveport.com] PUMP system (Personal Ubiquitous Media Platform [grooveport.com]). It rocks and is slicker than any other music management system I have seen. They're in the process of cleaning up the code for an Open Source release [grooveport.com], so good things may start creeping out soon.
  • This is already possible, Napster is using the fingerprinting tech from a company named Relatable. There is an Open Source audio metadata client using the same tech available from www.musicbrainz.org (formerly cdindex.org). Their database is under the OpenContent license at opencontent.org . The mp3 player Freeamp also uses this data to organize your music collection. Freeamp is the best jukebox program I have found for linux. Too bad Realjukebox doesnt run in linux.
    maken
  • I'd love to see a standard MPA or OGA format. That would be an MP3 Album or Ogg Album format. This could simply be a tar containing the songs processed at a standard bitrate by a standard compressor with a standard descriptor file containing CRCs, titles, the copyright holder and the relevant CDDB entry if any.

    What would you add to/remove from the above?

    ---
    My opinions are mine.

  • by Wakko Warner ( 324 ) on Monday April 23, 2001 @01:25PM (#273626) Homepage Journal
    Sure. I've put the code (tiny as it is) up at http://bitey.net/code/dupefind.txt [bitey.net]. The file should be self-explanitory, and there's a couple comments in there that should help explain the database (just a flat-file, actually) format.

    The code that actually creates the flat-file database (from a mounted CD-Rom) is here [bitey.net].

    Another script that plays stuff and searches through the database is here [bitey.net].

    - A.P.

    --
    Forget Napster. Why not really break the law?

  • why not add some support for adding lyrics? MusicMatch can store lyrics within the mp3's, why is is so rarely used i dont know! But there should be a tool to download lyrics for all your mp3s and add them to the files. and then a lyrics plugin for winamp, xmms, etc.
  • BeOS has a utility that takes ID3 tags and makes them into filesystem attributes, which means that you can sort and search by artist/genre/whatever. There are plenty of utilities specifically to do that elsewhere, but having it built into the OS like that is pretty sweet.
  • What's the point of using MD5?!? What does MD5 give you that simple indexing doesn't?

"If it ain't broke, don't fix it." - Bert Lantz

Working...