Forgot your password?
typodupeerror
News

Linux Applications And "glibc Hell"? 277

Posted by Cliff
from the .dll's-.so's-and-.a's-oh-my dept.
cybrthng asks: "How do you stop glibc hell on Linux? I thought I'd long lost the ever familiar DLL hell on Windows, but with Linux it breaks the applications so bad its not funny. Will Linux only be able to truely survive with applications available in source form? For instance take Oracle Applications, it is nearly impossible to install it on RedHat 7.0 or any glibc 2.2 based distro since the applications were built against 2.1.x. When you install this software it tries to relink itself with the correct libraries and fails miserably. You can however force it to use glibc-compat, but that isn't a solution for a production system. Do vendors have to recompile there applications for every kernel, every library and every distro? How can I use Linux when the core libraries don't seem to be forwards or backwards compatible across different distributions?"
This discussion has been archived. No new comments can be posted.

Linux Applications And "glibc Hell"?

Comments Filter:
  • by Anonymous Coward
    Hi,

    I read that the upcoming gcc 3.0 will have an "application binary interface", that is a more stable and compatible binary format. If they don't change the signature of the functions in the libraries all the time, this should solve this problem very well.

    Another thing: BeOS uses the egcs compiler, and they somehow managed to have very high binary compatibility even with _object oriented_ libraries. For example, they add dummy virtual functions to classes so that the layout of the virtual method table does not change when they add new functions. Linux developers should take a look at this.

    greetings,

    AC
  • If you were using Solaris, you'd probably run into the same problems at some point. Software vendors write to specific releases of platforms; if the box says "requires Solaris 7 with patch xxxxxx, xxxxxy, and xxxxxz", you'd do well to make sure you had an identical system. If it requires RedHat 6.2, you'd better put it on RedHat 6.2, not 6.1 and not 7.0. This is just the way it has and will continue to work. Don't blame RedHat, and, if you must blame Oracle, realize that it's difficult to support moving targets.

    - A.P.

    --
    * CmdrTaco is an idiot.

  • Static linking leads to its own problems. For example, a few revisions back Red Hat decided to move /usr/terminfo from the place where it'd been since time eternal to /usr/share/terminfo. No big deal -- except that every program that used curses or readline that was statically linked now failed miserably, because they were linked against the old curses that was looking in /usr/terminfo. Similarly, Red Hat decided to move the timezone files one year. Again, statically linked programs suddenly thought they were in Zulu time or something because the libc they were linked again could not find the timezone. Red Hat in particular has a bad habit of moving critical system files from place to place as they decide that the FHS reads differently than it did when they last visited the issue. And they don't even wait for .0 releases to move files around -- between Red Hat 5.0 and Red Hat 5.1, for example, terminfo decided to migrate.

    Don't let anybody tell you that DLL hell is not a problem under Linux. It is. Our development team specifically runs Red Hat 6.2 Linux rather than the "latest and greatest" so that our software will by default run on the most number of systems out there, and we still have nightmares at times. And software installation is even more terrible than DLL hell -- everybody has a different place they stick rc.d, everybody has different runlevels that they start up in, etc., it's enough to give a guy a nervous breakdown. I finally gave up and simply put a note in the installer, "Plunk a reference to script [xyz] into wherever your Linux wants its startup scripts" and tossed up my hands in despair. Coming up with an installer that would put something into the startup sequence on Slackware, Debian, SuSE, Storm, Caldera, Red Hat, and TurboLinux was an AI problem, not a job for a shell script slinger.

    Of course, this is how the distribution vendors want it. When it came time to package up RPM's for Red Hat, it was ridiculously simple to fix my little servelet to start up at system boot, just toss the start script into the right place and run the nifty command that Red Hat provides for that purpose. That's how Red Hat wants it, because that way it's tied to their version of Linux.

    Not that the proprietary Unixes are any better. I curse Solaris every time I have to touch it (did you know that the only way to get a list of all SCSI tape devices on Solaris 2.8 is to do a 'find' on the /devices directory, but *only* after running "usr/sbin/devfsadm -C" because otherwise you could have some device nodes in there left over from drives that no longer exist?). I can't get Xemacs to install from the package files on IRIX (looks like I'm going to have to compile it from source, oh joy). HP/UX has a broken 'uname' command (how the BLEEP can a vendor break uname?!), though at least I manage to get "HP/UX" out of it. Creating a shared library on AIX is deep voodoo requiring encantations and lamentations and really only works right on the very very latest version of AIX (don't even *try* it for two versions back, it won't work). As pissed as I get about Linux vendors and their habit of breaking the world every six months, at least Linux stuff generally works and is easily fixed when it doesn't.

    Note: I speak for myself, not for my employer. My employer specifically disclaims any statements I make about idiocies that OS vendors perpetrate.

    -E

  • Why do I say that? The glibc team did their best to maintain compatiblity from glibc 2.0, to 2.1, to 2.2. The only time this isn't true, and isn't a bug, is if the application used some glibc internal function. This is why things like star office (5.0 or 5.1, I forget) broke from 2.0 to 2.1. It's also probably why Oracle broke. But, if it's not the case, file a bug report and maybe it can be fixed for glibc 2.2.2.
  • When you use emacs that way, it only requires 125% of system resources, rather thant 250%.

    Bah, damned EMACS hating fances of the One True Editor . . .

    :)

    hawk
  • by hawk (1151)
    > Everyone misunderstands Emacs.

    See, there's the problem. It's mother didn't love it, it's father abandoned them, and a cruel society led it to it's life of crime.

    :)

    hawk, replying twice to the same message, bad form or not :)
  • >Resources under GNU/Linux (and other POSIX) systems are called things
    >such as "memory," "free space in /var/tmp," "processes," "LWPs,"
    >"texture memory," etc.

    Yes, theat's also true under Linux, and X/Perl/BSD/NSF/Apache/GNU/Linux,
    both of which are far more widely used than GNU/Linux, which is useless
    without the rest. :)

    The term "system resources" predates both Unix and the *existence*
    of microsoft by decades (as does Emacs piggish behavior, iirc :).

    >But when a resource is actually called "System
    >Resources(TM)", and it's measured in percent, ten bucks says it refers
    >to [23]the 64 KB USER and GDI heaps in Win32, which can't be enlarged
    >without breaking Win16 apps. (See also [24]Resource Meter.) I see no
    >analog to the 64 KB USER and GDI heaps on POSIX+X11 systems.

    nope. I'm not referring to them at all. Just the plain old ancient
    reference to taking everything the system has, and then some :)

    hawk, who hadn't thought himself old enough to be a curmudgeon.
  • As Linux's primary commercial role is in preconfigured appliances with no user interaction, the job of matching point releases is only done once in the life of the machine. In the future I expect even the bedroom use of applications on Linux to give way to one-time configured appliances, each with its own set of library versions.
  • Similar, but not quite. I was thinking of having the API support overloading of functions, so that if you added/removed parameters, you -DON'T- need a new function name, which would simplify the maintenance end.

    Also, it wouldn't so much figure out where the function was on that system as much as re-use components from a core set of dynamically loadable files, where one function call = one file.

    (What you'd end up with is a virtual GLibC that is built out of only those components you're using, be upgradable without interfering with dependencies (as the dependencies would be on the scripts used to generate the call, not on the implementation), =AND= avoid unnecessary duplication.

    (One of the really horrible things about the existing way of doing things is that you can easily end up with a hundred nearly identical libraries, doing essentially the same basic stuff, but NOT exploiting any re-usability in anything else that's out there. Hey, folks! That's stupid! Recycle what you can't re-use, just stop re-inventing!)

    My "ideal" glibc would be a script which searched through an index for all functions meeting some stated criteria. There may well be duplication of functionality, but that wouldn't matter, as the different implementations would meet different criteria, all accessible through the same API.

    The "central library" would then merge these DLO's in with the script, to "create" the actual functions the apps are going to call.

    This is superior to the Windows model, as you're not reliant on specific versions of specific DLL's being present, and upgrades can be done on the fly. (DLLs cannot be easily upgraded when in use. In this model, the application never sees the actual implementation, only the interface, so allowing for on-the-fly upgrades.)

  • The OO approach does seem to hold some key answers to this problem.

    I was thinking along the lines of each "function" carrying the actual call, a summary of the pre-conditions and a summary of the post-conditions.

    When you make the call, you define your parameter list, and your own pre-conditions and post-conditions. It is then up to the GLibC "wrapper" to mix-and-match what you're sending & expecting with the functions that are available.

    This would essentially do the same as the versioning map, but would have the advantage that the computer would take care of which version is good for what.

    eg: If you call a function "blort", with a given parameter set, no pre-conditions, but the post-condition that the output will be a value between 0 and 1 (0 = OK, 1 = Error), then GLibC should cheerfully ignore all "blort" functions that return values outside that range.

    This is the same as your parameter declaration, with one small difference. With pre- and post-conditions, you can combine the inputs and outputs of functions, to see what the net output will be. This would simplify making composite functions.

  • Well by that argument someone could replace everything in /usr/lib with bash scripts and kill Unix. Yep, you replace system files with nonsense and things won't work.

    How do you prevent this from happening on Unix? Well the files are only writeable by someone with root access.

    This is also true of Win2k, the files are only writeable by someone with Administrator.

    Without having different types of user security access, or by burning everything to ROM, I'm not quite sure how you suggest resolving this issue.
  • I don't appreciate it when obviously uninformed people try to claim I'm wrong.

    I verified this fact before posting my original message. The system files are not world writeable on Windows 2000.

    Not even by default.

    You had better go reevaluate your preconceived notions because it seems the world has turned upside down on you.
  • Use the -i (install) switch instead of -U (upgrade). It will keep the old libraries on the machine instead of removing them, which is what upgrade will do.

  • Why do you jump to the conclusion that the compatibility trouble is the fault of Linux or glibc? Did it ever occur to you that perhaps the application authors were causing the problems?

    There is no glibc compatibility problem. Properly-written programs have no problem whatsoever. If, however, programmers use calls that are internal to libc -- that they are told NOT to use -- then what do you expect? They have violated the rules, and it's coming around to haunt them.

    I routinely run binaries from pre-2.2 versions of glibc on glibc 2.2 systems, and not only on Intel platforms. I have experienced no difficulties thus far, with either binaries shipped with Debian or otherwise. I do not use Oracle, though.

    Perhaps an analogy would be useful. In the days of DOS, applications could totally bypass the operating system for basically any purpose. (Granted, because the OS stank so badly, there was often justification for it.) But when people tried to start using systems like Windows, many of these apps broke -- assuming that they had total control of the machine when they didn't, because they violated the rules before.

    In this case, there is no need to violate the rules and frankly if programmers are stupid enough to go out of their way to find undocumented calls liberally sprinkled with warnings, and use them instead of their documented public interfaces, I wouldn't buy their software anyway. PostgreSQL doesn't have this problem.

  • Speaking as someone who's used Intel's compiler on Windows, I definitely wouldn't expect it to be any better than gcc.
  • There's this little ABI that Microsoft published, called Win32- it was to this that he was referring to, not the DirectX ABI, that largely works only on the Win95/98/Me platform line.

    Applications written to the Win32 layer are supposed to work- no matter what. It's this ABI that Office 95/98/2k were written- and they install the same set of binaries for all Windows platforms out right now. This means that applications written to the Win32 ABI should work fine on Win95/98/Me as well as WinNT/2k/XP- no matter what (It's what MS has been saying all this time...).

    You know what? They DON'T. Especially if you're dealing with the GDI layer- that's toast (even though it's all part of the ABI...). There's bizarre behaviors that you've got to find out for yourself (because they're not even documented in the MSDN CD sets or anywhere else...) that cause a nightmare for supporting things like document imaging, etc. on Windows platforms. I know I did that for approximately 3 years at a previous job where I wrote OCXes, then ActiveX components to allow VB and other development platforms do document imaging on Windows more easily.

  • Ok, I should have formulated that better...

    Shared libraries lose their big benefits when every binary ships it's own version of that shared library. On win2k that has become the rule now, to avoid library version conflicts.

    On GNU/Linux (and most other systems), you *can* ship a separate version of that library with each binary that requires it. But the libraries will be installed at the standard locations, so if more apps require the same version, they will actually share the library. This is what RedHat used in the glibc-compat package, which provides a compatibility library so that RedHat 6.2 binaries will run flawlessly on RedHat 7.0, using the proper versions of their shared libraries, while the "native" RedHat 7.0 binaries run on the newer libraries. Simple, elegant.

    Our Linux apps [sysorb.com] (currently on RedHat 6.2 only) ships with a special libstdc++, which we will probably be the only ones using. However, because of RedHat's approach, we will not do this in the next release, it is perfectly reasonable to simply use the compatibility libraries on 7.0 and the native libraries on 6.2. Once we start supporting a stable 7.X platform, we will of course run on the native library versions there.

    On NT and Win2K we must ship specific versions of some DLLs in order to get anything running. There is no backwards compatibility, and there is no DLL versioning. It is of course very simple to just ship your own DLLs, and it works perfectly well, I am just arguing that I fail to see the problem with the GNU/Linux (and most UNIX like systems) approach. Especially given great vendor support such as what we see from RedHat (and probably others too).

  • Huh ?
    I'll check up on the DHTML thing tomorrow when I've gotten some sleep...

    The network monitoring system is a client/server system. The server is a large program (it is a distributed database and a remote monitoring system), and the client is a very small easily portable program.

    Thus, the server is available for RedHat 6.2 (and therefore also 7.0, the 6.2 version will work there), Debian 2.2, FreeBSD 4.0, NT 4.0 (and therefore also Win2K).

    The client is available for the same platforms, plus, RedHat 5.2 (with a Linux 2.0 kernel), and FreeBSD 3.4.

    As you will have noticed, the software is in beta, but we are *very* close to a release. There are bugs left, but we will have a release out fixing the last known ones, probably around the weekend.

    Should anyone out there have oppinions, suggestions, demands or "other", for a commercial program soon-to-ship for Linux among other platforms, I would welcome such feedback.

    Please use this e-mail address [mailto] and check out the website [sysorb.com].

    And please accept my apologies for going slightly off-topic on the subject here.
  • You may be overestimating the size of static linking. Much of the library size is symbols, which are of course not needed after static linking!

    There is lots of other overhead of shared libraries, which people seem to be blind to. The overhead on each call is not zero (on the best systems it is only significant on the first call, but even then I suspect there is overhead because of non-contiguous locations of procedures verses static linking).

    I don't see how putting the shared library into the bundle is going to help. If the system could identify it as matching another shared library and thus sharing, we would not have the dll problem in the first place, as that same library could reside where libraries normally do. If not, you have just added all the overhead of static linking AND the overhead of shared libraries.

  • Can somebody PLEASE confirm or deny this?

    This "requirement" of the LGPL is, imho, extremely harmful to the ability to distribute small projects (like my own toolkit fltk) under the LGPL. It makes what could be a simple program for the end user into an "install" nightmare. I just explicitly say "you can statically link" but I have not dared change the wording of the LGPL because I don't want to do "yet another license".

    It is not clear if the LGPL really says this, anyway. It says you must provide a way to relink the program. I think making the user send their new copy of the library to you and relinking it for them would satisfy this requirement. More pratically, providing .o files on request would work, or providing the source code under NDA would be ok.

    I also see no purpose in this requirement. I cannot believe anybody will use it to replace small libraries like mine. Any fix that would allow it to still link would be a bug fix and I would be much more interested in having the user be forced to reveal the bug fix! New functionality in the library is unlikely to be used by an already-compiled program.

    Anyway, I have asked dozens of times for anybody in authority to confirm or deny this "LGPL requires dynamic linking" statement. Also, if it does, I would like official word from GNU as to how to modify my license to delete this requirement without breaking the rest of it!

  • I believe the "RTTI overhead" is also in Java. It does not magically do this in some way that the C++ compiler can't!

    Conversely, though, RTTI would be simple and almost free (the vtbl pointer could be used and a single pointer added to it to point to the parent vtbl) if it wern't for damn multiple inheritance. C++ really fell down when they gave into the demands of those idiots who wanted multiple inheritance for non-pure-virtual objects. I do believe everything could have been done with a single parent, and some syntax for invisible ONE WAY casting to the other "parent" types, ie the class is only of one type, but can be easily used in calls that accept any of the other "parent" types.

  • You can however force it to use glibc-compat, but that isn't a solution for a production system.
    That is *EXACTLY* the solution for a production system. Run the software with the libraries it was intended to run with. I can't imagine why this isn't obvious to everyone.

    If you think using compatability libraries isn't adequate, please explain WHY, in technical terms (i.e., not "It seems yucky so I don't like it").

  • I see this as a naming problem. I could package the FreeBSD kernel with incompatible libraries. What do I have? A new OS. Same thing with Linux. The distribution is the operating system, and everything needs to act accordingly. Noone should say, "I run Linux", they should only say "I run RedHat" or "I run Debian", because, in reality, they are different operating systems, that just happen to share components.
  • I agree. I also failed to see what he meant by saying he wouldn't do glibc-compat for a production server. Why not? It doesn't have as many features as glibc-2.2, but that's what the program was made for. Did anyone else see what the big deal is for glibc-compat?
  • Also, under the LGPL, you can't distribute statically-linked versions of an executable without also distributing the unlinked .o files as well.
  • How do you use symbol versioning?
  • A while ago, as an experiment, I installed the glibc from RedHat 7.0 on a 6.2 system. Almost everything worked exactly as before, the only thing that broke was Emacs due to some problem with Berkeley DB.

    So I'd say it is Oracle that's being downright stupid. I don't know what is meant by 'tries to relink itself with the new library', but it sounds pretty unpleasant and unnecessary. Hundreds of other binary packages just kept on working when glibc was upgraded.

    Oracle is well known for being a pig to install and get running, and this is just another thing to add to the list.
  • For instance take Oracle Applications, it is nearly impossible to install it on RedHat 7.0 or any glibc 2.2 based distro since the applications were built against 2.1.x. When you install this software it tries to relink itself with the correct libraries and fails miserably. You can however force it to use glibc-compat, but that isn't a solution for a production system.

    Oficially, glibc2.1 was not a production release but development. If glibc-compat is not acceptable on a production system, then software linked to a non-production library should not be acceptable either.

  • This is a genuinely annoying problem, but fortunately it's also a solved one. The initial work was done at MIT's LCS for hardware, in the paper Dynamic Reconfiguration in a Modular Computer System [mit.edu], and it was implemented in software on Multics [multicians.org]. where I learned it from Paul Stachour [winternet.com].

    For prople primatily interested in Linux, and glibc2, there's a paper for the community, written by David J. Brown and Karl Runge on Library Interface Versioning in Solaris and Linux [usenix.org].

    (David J. Brown is the originator of the Solaris Application Binary Interface [sun.com] programme: I worked for him for two years on the project, back in my pre-samba days --dave)
  • Man, this is an awesome idea. Has someone done this already, or did you come up with it?
  • Glibc 2.2 is suposed to be backward compatible with 2.1 (and 2.0). I am running 2.2 and have not had any problems with programs built against 2.1 (and an even running some built against 2.0.7). The library which seems to cause problems is not glibc but libstdc++.
  • You have no clue what you're talking about.

    Here, I'll make it simple:

    With Windows, there can only be one version of a library usually, so you're just screwed unless you have source.

    With Linux, you can put as many versions as you need for compatibility, and use symlinks to make as many applications see them as necessary.

    If your application requires libfoo.so.1.1 or greater, then libfoo.so.1.1.x (where x is any number at all) should all be compatible. If they aren't, it'll be libfoo.so.1.2.x instead.

    So put libfoo.so.1.1.7.8.3 on the system, and create a symlink so that libfoo.so.1.1 points to it, and all is well.

    Applications that REQUIRE libfoo.so.1.1.7.8.3 will be happy, and so will applications that require any 1.1 or later, and if necessary you can also put a symlink for libfoo.so.1.1.7 pointing to it.

    Can't do that with Windows without putting a seperate file for every one, because symlinks aren't supported.

    All I have to say is the farthest lefthand digits my application requires; probably 1.1+.

    As for RPM dependencies, that's a packaging system, not an operating system. You can even use it on Windows, but you don't attribute it's failings to Windows, do you?

    If somebody chooses to package their application in a certain way and not any other, you should direct your blame to the application distributor, not the OS author.

    -
  • > Linux/x86/glibc-2.2 - 310696 (76 pages)

    Damn! And I thought Delphi Hello World! programs were pigs at 270K. Of course, it remains to be seen how bloated Kylix programs will be.
  • > Yet her you are advocating EXACTLY the same approach to get round the glibc fiasco.

    I believe he's not thinking of static linking as actually "shipping your own shared libraries", which in essence it really is.
  • let's say you have a bunch of "polygon" objects in a linked list, and you mistakenly put a "circle" object in that list as well.

    What's wrong with this? After all, what is a circle but a polygon with an infinite number of sides? :-)


    ---
    "They have strategic air commands, nuclear submarines, and John Wayne. We have this"
  • On my system, I have 558 different packages. To compile all of them from source would take days, if not weeks (I know compiling X would 3+ hours).

    It's nice that you have the time to recompile all your programs from source. I don't.
  • <I>When using real software like Oracle under linux, you find out what the requirements are for the application you're going to run, and install a compatible setup. </I>
    <p>
    Yes, but I think what the questionner was getting at is that this is NOT the way it should be.
    <p>
    -josh
  • You describe a process that shouldn't be necessary.

    So far I have seen many people pose solutions for running glibc2.1 and glibc2.2 together on one system. Or to get your RH 7.0 to have glibc2.1 libs on it. Someone even said that this is really a none issue since you can run 2.1 and 2.2 on the same machine.

    I think you ALL missed the point. The question is really WHY are there so many incompatiblities between 2.1 and 2.2? These are MINOR revisions. Going from libc 5 to libc6 was considered a major change. One had thread support one did not have very good thread support. Plus there were other changes.

    I noticed that developers in the open source community are more for lets make this revision better and fix these bugs, but screw it if it does not work with the previous software revisons, people can just upgrade or recompile the old software. This is the general attitude. That and oh they can keep multiple revisions of the libraries (gtk+1.0 and gtk+1.2 rings a bell). Hey just look at kde. kde 2.0 was so different from kde 1.x that You basically had to uninstall (kde) and reinstall (I did). The whole architecture was that different.

    I have used windows and between Win 3.1 and Win 95 there were problems, but I do not have that many between 95 98 and NT 4.0. (Oh and I do use Linux as my primary system).

    One of the things that open source groups try to do is to get the program 'right'. While this is good, they often overlook those of us who don't use a particular distribution (or rather would not), and they say well let the distributions figure it out for the community and anyone else can get the source. Redhat tries to do this (and of cource they add there own stuff) and then you end up with them releasing something like RH 7.0 which so many people have had problems with.

    What ALL these people either fail to realize to maybe it's that they just don't care, is that no mater how easy of a GUI Linux has with kde or GNOME, and how easy of an installer you make, when someone wants to upgrade one package on their system if there is not an easy way to do it then they run into the problem that this person has. Trying to run a program that comes in binary form only with a library that it wont work with.

    Is there a solution? I say yes, but it lies in the hands of the developers to say lets get it right the first time. How can they do this. They can try using a design and a design document first rather than just coding and making changes later. The other thing they can do is eliminate scope creap. Scope creep is what happens with the Linux kernel. They say we will fix this and that in development x.x.x and then they start finding other things to fix and start work on those systems as well and the whole development cycle takes twice as long. Then by the time that they are done the whole architecture has changed and they have done more then they planned. (Scope Creep).

    I don't want a lot, I just want it all!
    Flame away, I have a hose!

  • windows always sufferes from feature creep. Look at going from Win 98 to win 2k or win ME. Not really that much different just a few new bells and whistles that most can do without. Of course they can't make it faster then noone would buy a new pc.

    I don't want a lot, I just want it all!
    Flame away, I have a hose!

  • Why can't things just be fowards compatible?

    Because that generally requires predicting the future, which is effectively impossible. An application that is designed to link against glibc 2.1 would have to have anticipated the changes in interface and functionality in 2.2 to be able to work with 2.2.

    True forward compatibility would require some kind of interface and functionality discovery mechanism, in which an application says "I'm looking for a version of function x which has such and such behavior", and is then given a reference to the appropriate function by the OS or some other middleware like CORBA or COM. These systems do support a weak form of interface versioning based on a version number. However, all such systems still require you to keep earlier versions around, so you won't easily get away from the requirement to maintain old library versions.

    What is there to linux if you have to hack a system together that is totally incomprehensable to proffessionaly maintain on a commercial software level (ie, not having the sourcecode like in the original question).

    The basic issue is not specific to any OS or language: it's a simple logical constraint. Why can't an old vinyl LP record player play CDs? Same reason.

  • You have no business running quake on an Oracle server.

    I am talking specifically about oracle here. This is not a home user application. The problem you are discribing is valid though, just not addressed in my post (or the parent post).
  • Hmmm... I really don't get your point. If Oracle (for example) says that it needs RH6.2, it will run on RH6.2 regardless of who loaded it. Yes different people laod different software (full install versus custom install) but that doesn't affect the libraries. That's what we are talking about here right? Do you have an example of this not being the case?

    The parallel between the service packs (which are different versions of dlls/libraries) is perfectly valid in a sense that companies like Oracle have very specific requirements, and people that uses these big time applications follow these requirements to the letter. I think that's really all the poster was trying to illustrate: the "other way around" fact.
  • No to trash open source project like progreSQL or MySQL (I am all for them and use them) but Oracle is currently (still) in a completely different league. The number of features it has is astounding - I have just discovered lots of them recently in training. Of course, I'll probably never need to use these features, but that's just me - the right tool for the job.

    And the issue here is not "faky behavior", it's the fact that Oracle doesn't distribute the source and it won't link because of the different libraries. Now we can start an argument that Oracle should distribute the source code, but that has basically no chance of ever happening.
  • Oracle was just an example. Oracle won't provide the source for there software so it is essentially illegal for them to link against anything that is GPL becauset he GPL license doesn't allow that. (I believe LGPL possibly does, but i could care less, out of scope for this discusstion).

    For another example. Buy a RedHat 6.2 based server from VA Linux and that has different software from the RedHat 6.2 you buy from RedHat as well as the RedHat 6.2 system you can buy from penguin computing.

    This "Service Pack" thread of NT is BS as well. If it was as simple as running linux with Service pack 1 then by all hell that would be easier then saying "This software must run under RedHat 6.2 with rpm updates blah blah and kernel version 2.1.17-28"

    Atleast i can install oracle (again, an example) on Solaris 8 because i know solaris 8 is certified. Doesn't matter if i have Sun come in and install it, its still solaris 8, doesn't matter if i hire an outside consulting firm to install it, still solaris 8, doesn't matter if i get it pre-installed, its still solaris 8.

    Linux doesn't have anything close to a "foundation" for an application framework (such as oracle and oracle applications) to build from and such you have to run features so specific its not linux but Redhat.

    Back to my original question, why call it linux or even compatible if each distro and each version is so different that you have to be so specific?

    Atleast with NT i know if it says Service Pack 3 i can Buy a server from Dell and specify service pack 3, but with RedHat there is no standard, not even within there own versioning. RedHat can't say "Pengiun computers must use the base install + drivers and optional support".

    This really won't go anywhere, i just wanted to see peoples opinions on it.

  • Corel created Corel linux because they needed features that other distributions don't/didn't/won't have. So are you saying Oracle should come out with its own version of linux or pay redhat to keep 6.2 afloat? (after all, for enterpise linux, why would oracle float the costs to develop instead of vice versa, the os is only as good as the apps, and no matter what anyone else thinks oracle sells alot more then just databases!)

    Is the beauty of linux that linux isn't linux but everyone can make there own distro and nothing will ever work fully because everyone has there own idea of how things should work?

  • Redhat 7.0 is supported, RedHat 6.2 will be falling off the face of the earth. My question isn't also really concerns of oracle specific issues, but that of any application shipped in library and .o form that requires relinking.

    So is glibc2.1 and the listdc++ changing so much to support new hardware/new kernels and new features that old and working systems cannnot move forward?

    Ofcourse you need to certify your business applications against a specific environment. But my question is how can we have so many vendors each calling there system Linux but we all have to run RedHat 6.2 because that is the only supported linux?

    The SP fix is just BS. If it was that simple then i could apply the service pack on any "linux" distro because they're all functioning in the same so called way.

  • So your saying not only should have have multiple glibcs to make everything work backwards compatible but i should force my applications to link against glibc 2.1.3.1.382.1.3 so when people upgrad to glibc2.1.3.1.39 they have to keep the older library around for backwards compatibility?

    Why can't things just be fowards compatible? I'm not talking about linking against glibc 2.2 and running on glibc 2.1, i'm talking about why do apps linked against glibc 2.1 fail under 2.2?

    And thats not all of it either. You can't just say requires glibc 2.1.3, you have to say Requires GLIBC 2.1.3, RedHat 6.2, Redhat upgrade of so and so rpms, IBM's java sdk 1.1.8 and kernel 2.2.17-24.

    What is there to linux if you have to hack a system together that is totally incomprehensable to proffessionaly maintain on a commercial software level (ie, not having the sourcecode like in the original question).

  • Surely C suffers from this problem, though? It's possible, IIRC, to (under some circumstances IIRC - I'm not a C expert) write a value of one type to a variable of another directly, while it doesn't check the bounds on an array. Both produce strange errors which aren't that easy to spot as the crash happens when the wrong data is read (could be at pretty much any time) as opposed to when it's written, plain and obvious for all...
  • "Write and link your application and pay attention to the very few caveats revealed by the GNU glibc team, and your app will run well on many different versions of glibc. Ignore the prophecies of the GNU glibc team, and you may be assured that your app will go down in flames. "

    Yep, agreed there. Or alternatively, "just compile it statically". I'd love to see what gzexe would make of soffice.bin if that were done ;)
    ~Tim
    --
    .|` Clouds cross the black moonlight,
  • "The problem that Linux Distributions are so different is another. But that's not glibc's fault."

    Quite so.

    So what if I have a better version of glibc as provided by debian unstable than yours provided by RH7.0? That's not an issue in the slightest, you can always upgrade, I can do so easier... ;)
    The "problem" arises when you expect to compile something "for linux" without releasing any source. In that case you have to support every distro, every version of kernel, every version of every dependent library... but that's your choice for being binary-only.

    Me, if it can't come from `apt-get -b source foo' then it doesn't get installed.
    ~Tim
    --
    .|` Clouds cross the black moonlight,
  • Well, comparing this situation to Solaris isn't quite fair. Companies who write applications that only work on a specific version of Solaris are usually doing it wrong. Most version incompatabilities with Solaris software come from the developers using the wrong functions/interface into the kernel.

    Sun has does a very good job of publishing the supported mechanisms for doign things under Solaris. Usually these are abstract, and are guaranteed to continue to work over os releases. It's when the developers get clever, and use the system calls and functions these wrappers use directly, rather than going through the supported interfaces, that problems come up over patch or os revisions.

    In short, most Solaris software compatability problems come from sloppy application engineers. Most glibc compatability problems comes from the confused state of the library itself or distributors so bent on being so cutting edge that they don't really this of whether the stuff they're releasing actually works.
  • If symbol versioning takes care of this issue, then let me get to my favourite rant about stinky rpm (did I mention that it stinks?)

    What is an ordinary Linux person (i.e. me) supposed to think when trying to 'rpm -U' the next version of glibc and a billion dependency complaints come back? This is just wrong. And do you get any help on whether it's safe to force the upgrade? Not by rpm, you don't.

    Half-useful / half-dopey programs like rpm give Linux a bad name.

    And, by the way, if symbol versioning is good, why is it even an option to build without it?
  • Redhat 7 uses non-standard C libraries and C compiler. Both are development releases, not 'stable' releases. I've seen quotes from Linus himself that RH7 as a 'development envioronment' is completely unusable. RH7 is even binary-incompatable with other Linux distributions.

    Wait for 8, when hopefully redhat will correct their mistake and ship with stable versions of C libraries and C compilers.

    -- Greg
  • Not true.
    there are literally thousands of crucial dlls are writable by anybody.
    the winnt directory is world writable and so is the system32 directory. Anybody can alter any one of them and bingo a broken system.

  • There is nothing preventing anybody from creating a dll called wininet.dll and stamping it with version 99. You can do this in VB. Then anybody who installs this program has a broken windows from then on.

  • Yes, rpm is a pain in the ass. I much prefer the packaging software that's been around a lot longer, like that under Solaris or Digital Unix or IRIX.

    And, by the way, if symbol versioning is good, why is it even an option to build without it?

    Because it isn't easily automated. You must choose which symbols get exported, which get restricted, and what version number goes with each symbol. Some symbols (which ones?) might get their versions incremented (by how much?) for a particular build, and some won't, depending on you and your app.

  • The easiest solution, one which I have recommended to the company for which I work, is to statically link commercial products. That way, you don't need to worry at all about what libraries are installed. It's much easier for the users and the tech support people. IMHO, more Linux apps should be distributed as both static and dynamic, so that more intelligent/experienced users have a choice.

    Also, it's a fairly good idea for people to have older libraries installed on their system. I don't see why you wouldn't, except if you're out of disk space. In that case, just go out and buy a 20GB IDE hard drive for $99.
  • I've experienced this problem. I once had an application I needed to get running that involved two different database applications: Oracle and VSystem(a real time database). The problem was, at the time, VSystem would only run on RedHat Linux 5.x and Oracle would only run on RedHat Linux 6.x. Because we had to link the two programs with callbacks, the compat libraries were not an option.

    What I ended up doing was putting them on two different machines and writing a glue program with Perl::DBI . There is now a VSystem for Linux 6, but we are still using the version I wrote with Perl, because there is no reason to switch.

    Yes, this sucks.
  • Ok, I think I see the restriction: (using your nice example to build upon)


    class A
    {
    public void method1(int);
    public void method2(char*);
    public void method3(double);
    }

    class B
    {
    public void method1(int);
    public void method2(char*);
    public void method3(double);
    }

    void foo(A *obj)
    {
    obj->method1(3);
    obj->method2("s");
    }


    > Now, the only kind of object you can pass is of class "A" or a subclass of "A".
    Right, since the language wouldn't let you call foo(b), because while B has the "same interface" as A, it doesn't have the same type.
    (You could cast B over to type A, but that is unsafe, and leads to run-time errors, not compile time error checking, so we won't go there.)

    > For static type safety, however, all that really matters is that the argument to "foo" is an object that has a "method1(int)" and a "method2(char*)".
    I would imagine this is done for performance reasons, and it keeps the compiler "simple" to write. The compiler would have to keep track of, "does this class have a matching function accepting all the correct paramaters/return type"

    What are the downsides of allowing the compiler to accept foo(b) as valid code?

    > GNU C++ had support for it in the form of the "signature" extension
    Cool, will have to check that out.

    > (templates may seem to give you similar functionality, but they don't really).
    Right, generic programming, is an orthogonal issue.

    > The additional requirement that the argument is of class "A" or a subclass is an additional restriction that may or may not model some aspect of the problem you are trying to solve and you may want to express it sometimes.

    Could you expand a little more why is it so bad of having classes A & B inherit from a common "interface" class (with potential virtual functions)? Does it lead to class-inheritance bloat?

    I think this would be one time, where MI (multiple inheritance) would actually be usefull.

    P.S.
    Thx for the discussion. This topic makes /. worthwhile.
  • Hmmm... read up on the LoadLibrary call in the Win32 SDK. You can solve the problem of DLL versioning by putting a copy of the DLL you need in the same directory as your application. It will be loaded first, before any of the libraries in Windows\System(32).

    It's not as nice as the *NIX way, but it works.
  • Not that I'm advocating this for Linux but this is the main reason for all that COM groat.
    One alternative is to number the dynamic shared objects and have the linker automatically resolve to the right dso version. For compatibility you just install multiple versions of the same dso in the lib directory and each app should resolve to the version they were compiled against on the development system.
  • Here's some quick thoughts I've had with regard to dynamic libraries:

    Shared object code should be kept in version specific folders, i.e.:

    /usr/lib/share/gtk/1.1.2/
    /usr/lib/share/gtk/1.1.3/
    /usr/lib/share/gtk/1.2.0/

    When a app starts up and links to a .so - it should enter itself it that version's used_by.txt file (unless it's already there). If a newer version is available it should notify and ask the user if they would like to try the newer .so or to stick with the old one. Either way they can check off a "...and don't ask me again for this application".

    If the program crashes while using a new .so (while making a library call?) it should flag itself not to ask again, as it seems incompatible.

    If the program successfully completes it could flag itself as working with the new .so and remove itself from the old .so's used_by.txt file and add itself to the new .so's used_by.txt file.

    When a .so is no longer used by _any_ applications it would be (re)moved.

    That sounds pretty safe.

    Of course this should all be implemented in a invisible manner to developers. And why don't we solve hunger and war while we're at it?

  • You mean the app would just need to know the name of the function it's trying to call, and the API layer would figure out where that function actually is on that particular version/system?

    And significant API changes would create new names of the function (GetFileLock might become, say, GetFileLockEx when new parameters are added or its return value types or meanings change)?

    So, like Windows? ;-)

    -b
  • Hmm. I was being kind of facetious, but now I'm interested.

    Overloading APIs is an interesting idea, and it solves some of the problems. You'd also have to handle an arbitrary return value; if an older version of GetFileLock returns 0=Failed,1=Worked, what are your old programs going to do when the new version returns 2=Deferred (or whatever)?

    It seems to me that in order for this scheme to work, apps would need to tell your virtual GLibC each input parameter and type, and what output parameters they can handle. Alternatively, each function could be strongly versioned and there could be a compatibility map, so when an app wants GetFileLock v2.5.1, the central script knows that 2.6.3 will work, but 2.9.1 won't.

    I don't like the one function = one file idea, simply because 1) You'd end up with thousands and thousands of API files, so 2) it would be next to impossible for software vendors to specify "known working" configurations.

    You're only partly right about how Windows works. While DLLs cannot be updated while in use, increasingly MS is going with a "different DLL for each version of the library" approach. It's all about objects and object ID's these days, so the actual DLL's don't matter as much (where having two versions of the same DLL with the same name in different directories used to wreak havoc, if they're different versions today, they'll register different object ID's and everything will be fine). If you install an app that wants a later version of some library, it can install that later version and register it, since those object ID's aren't in use, even if an older version is. Hopefully that made sense.

    Of course, this is only true of modern, COM/ActiveX based apps that create objects based on object ID, not DLL name/function name.

    Apps specify the object ID that is unique to the library version they were written for (eg my app uses the ADO 2.6 library, but the ADO 2.5 library is also around and can be invoked by older apps). That works, but it doesn't get rid of the "multiple versions of the same code" problem. In fact, it makes that problem worse.

    And while there's nothing to stop the OS from going out and getting ADO 2.6 if it's not installed, functionally that's not happening right now.

    It's an interesting problem. Basically any solution is going to add some overhead and complexity, but it's probably well worth it.

    Maybe the answer is tighter API/code integration? For the sake of argument, look at it in an OO light; maybe the program asks some API what parameters it will need to supply to accompish something, and then makes the actual API call with those parameters? Likewise on the return codes; some sort of metadata about what the return means?

    Cheers
    -b
  • Hmm. I like the pre and post conditions, but doesn't it get thornier for more complex return values? Like, say, a pointer to a struct rather than a boolean value?

    The API matchmaker would have to be able to parse and understand structs, which I guess is ok.

    And if you look at your blort function, suppose a new version returns 255 for "OK, with a warning". The GlibC can't consider that a function that returns between 0-255, because a later version yet might return 2="Will batch process tonight". The problem fixes itself if you force tight enumeration, but is that really practical?

    Interesting stuff. Are there any good books on the topic?

    -b
  • So you want me to go do my chip design work on some form of Windows, then?

    I'm trying to get the chip design job moved to commodity PC hardware, using Linux. Guess what, we may be able to program our way out of a paper bag, but that's not the job at hand. I need to spend my time on my primary job, not settling glibc incompatibilities.

    Windows is starting to become practical for chip design, but the opportunity for Linux is still wide open - for a while. Last time I tried building the gEDA suite, I had to give up after a while - there just wan't time to get all the library issues resolved.
  • Everyone misunderstands Emacs. Emacs is NOT bloated. It is "extensible". There is a difference.

    Yes, and I tend to agree that an editor should be extensible in this way. Extension languages are cool! But I don't think that having a small, extensible core makes the system fundamentally "small".

    The problem is that the extension components are inextricably tied to Emacs. They don't do "one thing well" in the traditional sense, because they can't be used by arbitrary other programs, ony Emacs. Even within Emacs, their interfaces make them much less reusable than Unix utilities that communicate mostly by command name, stdin, stdout, and exit status. Thus, I think it is only fair to call the Emacs core plus the set of extensions you are using a single whole. And that's "big".

    If you startup a barebones version of Emacs using "emacs -q", you will get an editor that starts up instantaneously, consumes little memory and is lightning fast.

    Isn't that emacs "binary" an undumped image with all the required lisp already byte-compiled?

  • That is the way it is on *any* platform for applications like this. Oracle does not claim to support or have their products ready to be used on "Linux" They have support for it on RH 6.2. You will find that in the winders world they specify service packs. And that for other flavors of Unix they specify versions, patches, and kernel parameters. With big time software this is the way it works. And truth be told that is the way it should be. Simply put if Oracle tried to support and run on every possible version of every platform out there it would not be too long untill they could not really support or run on anything.
  • That would be wrong Linux is the kernel nothing less and nothing more. RedHat 6.2 is a OS that is the OS that Oracle supports. Debian is a OS it used the same kernel and many of the same tools as RedHat but it is *not* the same OS. There is more to the OS then the kernel. Also if you are running a Oracle server on a RH x.0 release you are insane. Redhat will support 6.2 for a *very* long time because they understand and are honest about this point.
  • i don't know about everyone else, but I blast Microsoft for bloat in the sense that Word includes everything but the kitchen sink despite the fact I doubt very much if anyone uses more than 10% of the features in Word. Big hulking programs due to feature bloat is a what I think of about Microsoft. But then, I'm not enough of a programmer to get pissed about code bloat due to static linking.
  • You got it all wrong for MS-Office.

    So you think Office is good because it consumes a little more space than StarOffice???

    I won't even argue with the logic of that. I don't like or use either product.

    Let me put my original point another way: MS Word 5.1 for Mac was small enough to fit on an 800k floppy with room left over (OK, if you wanted spell-check, two floppies). As far as I can tell, Word 9x/200x [may] provide marginally more functionality while consuming 2 orders of magnitude more space. That is what I call bloatware. Lots of feature creap that does not help me type documents any better, but sure as hell makes the whole program bigger and slower. Oh yeah, Word 5.1 for Mac was statically linked. No extensions/libraries/anything.

  • by sheldon (2322) on Wednesday February 14, 2001 @01:39PM (#434119)
    Ok, I don't think you understand at all what DLL Hell is on Windows. Your assumptions as flawed at any rate.

    The problem with DLL Hell under Windows has nothing to do with a lack of versioning. DLLs do have versioning, and Microsoft has tried to always be careful in leaving old interfaces compatible so as not to break upward compatibility. In cases where they clearly have to break compatibility they do the same thing as Unix... they create new files with new names.

    VSVBVM60.DLL was a replacement for VSVBVM50.DLL which left the 5.0 functionality intact. However there have been like half a dozen versions of VSVBVM60.DLL released which fix bugs internally without breaking external interfaces.

    With COM you don't even care about the filename since all interactions with the component are through the ClassID. You can create a new file, with a new name that registers the same ClassIDs as the old component and redirects them internally as appropriate.

    There are differences between COM and classic DLLs and I'm going to talk about classic DLLs primarily because that's where the DLL Hell problem has occured primarily.

    Now one problem with classic DLLs is that only one copy of a DLL is ever loaded into memory, and this has created many problems. This isn't completely true under Windows 2000 which has additional new features I'll mention later.

    Where DLL essentially comes into play is from REALLY STUPID BRAINDEAD INSTALL PROGRAMS! Not to mention Microsoft's lack of attention to this problem existing and unwillingness to do anything about it, even if it was just education.

    As an example, let's say you have version 4.1 of generic.DLL installed on your machine by application X.

    Application X works fine.

    Now you install application Y. This install program installs a copy of generic.DLL version 4.0.

    Now application Y's install program is broken, it either does two stupid things, it installs this old copy of generic.DLL over the top of the newer copy already in windows\system. Or it places it in the application directory.

    In the latter case, if you run app X before app Y everything works fine. But if you run app Y, when the system searches for the DLL it finds one in the app path first and loads that. All subsequent requests for this shared DLL are passed the pointer to the one already in memory.

    In the former case, app X is never going to work if it relies on functionality which existed only in the newer copy of the DLL that it had installed. app Y broke it permanently by stupidly overwriting a newer DLL.

    Similarly with COM DLLs one could overwrite a newer version of a control with an older version which breaks assumed functionality.

    The solution is actually really quite obvious in 95% of all cases. Most of these DLL problems result from shared DLLs as part of the OS or distributed with the MS Development tools. The solution therefore is to only allow Microsoft to distribute and update these via Service Packs. This is the path they have finally gone down in Win2k.

    That is to say these MS shared system DLLs should never be deployed with an application. If your app needs a certain version then you should specify that you need Service Pack 4 or whatever.
    We'll see if this happens with say Office XP.

    You then only have an issue with shared third party controls. Now this can be solved by using Win2k's ability to localize DLLs to the application and thus load multiple different versions.

    Anyway, that's basically a definition of the problem.

    BTW, as to your signature line... Linux/BSD is not my main server/workstation and I'm pretty certain I'm far less ignorant than yourself on topics regarding Windows as a result. :-)
  • by Ektanoor (9949) on Wednesday February 14, 2001 @04:53AM (#434120) Journal
    I really don't see the point. What Hell is happening on glibc 2.2? As far as I see it is the first glibc that smoothly installs over older versions without cramming the whole system. And not only.

    One development/testing system is working here since July 2000. It suffered more than 30 glibc upgrades, ranging from late 2.1 version, running through a whole series of pre-2.2 and right now working on 2.2.1.

    During these upgrades, apps suffered some serious crashes during two-three pre-2.2 versions. Not more. Some applications, based on older 2.1 and even 2.0, have kept working until now. For example, Netscape and Quake2. Besides, I didn't note serious problems with 2.1-based apps.

    Due to the purpose of this machine, I managed to see how most of these apps are rebuilt up to 2.2 glibc. Here, some incongruences did appear but I cannot say they are a "Hell". Most cases are the result of a few differences in variables. This can be a serious hassle for an average user but it does not hamper his use of a Linux box just by upgrading to 2.2.

    Most of the packages I used came from Mandrake Cooker project.
  • by redhog (15207) on Wednesday February 14, 2001 @02:20AM (#434121) Homepage
    First of all, a free system is not aimed primarily at making binary aplications work, but at making free aplications, which comes with source, work.

    Of course binary compatibility is nuice - it means you, or your software vendor, doesn't have to recompile everything now and then. But it comes at a high price - unexpandability. You can not add a field to a datastructure, since that makes the struct bigger, and breaks compatibility. In source, adding a field is never a problem, and compatibility amounts to preserving old fields that someone might expect, and put values that they won't dislike, into these fields.

    Of course, you can do uggly tricks like a hash-table of the extra fields for all objects of one type, that you index with the pointer to the original object. This is for example supported in glib. But it's terribly uggly, and is to beg for problems (like mem. management problems).

    I agree however that glibc have had some problems - it hasn't allways been 100% source-compatible...

    And - try to search for 100% binary compatibility between say Windows 95 and Windows NT 4.0. Have fuN!
  • by avdp (22065) on Wednesday February 14, 2001 @05:56AM (#434122)
    You are 100% correct. People that have the money to run Oracle (and we are talking about LOTS of money here) go to Oracle and find out what it will run on and go with that. Oracle says RH6.2, then RH6.2 it is. You feel you must be using RH7? Great. Put RH7 on another machine and go play there.
  • I disagree strongly with the statement that using glibc-compat isn't a solution for a production system. In fact, I'd rephrase the statement to be "You can, however, use the system that was put in place for just such a purpose, but that isn't a solution for a production system."

    The people who came up with glibc-compat did so because they anticipated the difficulties associated with upgrading the systems as a whole to newer libraries. There's nothing wrong with installing the older libraries and they're not something that should be avoided for "production" systems.

    To be sure, it would be nice if Oracle got with the program and updated their tools to run on a more recent glibc, but until that happens, you have an alternative to sitting around, scratching your head, and saying "Gee, it doesn't work." That makes more sense than the "you should statically link all major applications" crud that others have posted.

  • by dsplat (73054) on Wednesday February 14, 2001 @04:30AM (#434124)
    The customer is not just a dumb lump that needs to get out of your way. they are what makes your software viable. with out them, your just a lone hacker hiding in your room, writing stuff noone will ever use.


    Even if every free software developer wanted to limit the scope of our market to other free software developers, there is the issue that each of us has a finite amount of time. I use more software than I will ever have the time to actually work on. Even rebuilding everything against each new release of glibc and gcc that I install takes time. Being able to install binary distributions of large amounts of free software saves me time to work on the projects I'm involved with.
  • So if I need to run 2 different apps on a system and they have 2 different incompatible needs, what do I do? Run it on 2 differnet machines? Expensive, inconvenient, and sometimes having the 2 apps on 2 different systems does not meet the requirements of what needs to be done. So basically I am hosed.

    I hate to say this, and hope this isn't considered flamebait, but Linux definitely needs more quality control with PRODUCTION releases of libraries and making sure that if it worked in glibc 2.1, it HAD BETTER work in glibc2.2 too.

    If that is impossible, because the library behaved in a buggy (or nonstandard or unsupported or inefficient) way, and fixing it would break something that depended on that feature, that is different than just having something no longer work.

    However, old apps need to work with new libraries. Here is a solution. Add a libc call called expect_version(). Your program calls it with the version it expects. The library will behave in a way compatible with that version. If that call is not made, have the library behave at a default level of compatibility . Have binary only software be built to issue this call with the version of all the libraries it is designed to run with.

  • by danheskett (178529) <[danheskett] [at] [gmail.com]> on Wednesday February 14, 2001 @08:23AM (#434126)
    You got it all wrong for MS-Office.

    Take a clean install of Win2k Pro SP1 on a box, default options, record the space used. Just about 800 Mb. Thats alot.

    Now go and install Office 2k, on that same box. Default options, 280 Mb - now given I always trim out that absurd Clippy thing, and a few other silly things (I dont speak spanish or french, why do I need dictionaries and grammar check for them) my default install of office is 240 Mb - that comes with a real nice Word Processor, a real nice spreadsheet, a decent presentation tool, a toy database which scales nicely to real databases, a decent HTML publisher/editor, and a set of universal clipart and graphics.

    Thats not so bad, in my opinion. Compare to StarOffice, which chews up around 120-140 Mb of my hard drive space and doesnt work nearly as well as Office 2k- in addition, it provides but a fraction of the features that Office2k does.

    Take a look at some of the features in Word sometimes. You'd be surprised - the latest features of Office2k, collobration tools, are used widely in a lot situations. You mark a document as coll. tools, post it automatically to your workgroup server, people work on it *concurrently* and the changes of all people are recorded and highlighted, for later review by the original author. Thats a nifty tool for small/medium size workgroups. The same tools work for frontpage, excel, and powerpoint.

    You may not like it all the much, but Microsoft is writing better and better code all the time. Whats more, is they are eliminating points of failure - Windows 2000 and Windows XP manage DLL's in a different way - caching them and ensuring that system dll's do not get crunched up a bad program or broken install. Normal everyday users like that, a lot.

    A final note, static linking is exactly what MS does and encourage - it includes DLL's to keep seperate, and therby sacrifices space for compatibility. Its not a bad tradeoff, or most users.
  • by Matthias Wiesmann (221411) on Wednesday February 14, 2001 @04:29AM (#434127) Homepage Journal

    Static linking is one solution, but it seems a little bit heavy handed. Disk space is one problem, but indeed not major one. Another problem is that the library cannot be shared. This means that two programs using the same library will have to load it in their memory space. This means more memory consumption and more loading time.

    Another nice solution could be something like bundles under Mac OS X/Darwin. First the library system knows the version number of each library, and can load the one the application needs - this alone would solve the problem described here. Secondly the library can be installed inside the application's framework, so you have the benefit of static linking without having to build a monolithic program.

    This means that you can solve such problems easily. Need a specific library? Move it into the bundle. Can use the normal library? Move it out of the bundle. Simple. The DLL-hell problem comes, IMHO from the rather simplistic dynamic libray handling codesystem.

    To have an idea about bundles, have a look at the article in on Ars Technica [arstechnica.com].

  • by Gendou (234091) on Wednesday February 14, 2001 @04:22AM (#434128) Homepage
    Linux is an open source architecture that's geared towards users building their programs from source. Duh. This works great. However, there're are a few specific cases where you have to bite the bullet and use whatever distro big programs like Oracle were built for. Here's why:

    Oracle was originally built for specific operating systems, and in the non Windows arena, specific versions of UNIX. It's not at all surprising that you'd need to run a specific version of Linux from a particular vender in order to use it. Sad but true fact. It really can't be helped at this point so focus on running your organization, not resisting some obvious limitations of the current architecture. (Oracle doesn't work on Debian or Slackware either - my shop tried, and as much as we hated doing it, we were forced to run it on RedHat.)

    On another issue... Some people say, "companies should static link libraries to their programs!" Well, this is only taking a bad situation and making it worse. If this is done, binary only releases of software will suffer with flaws in existing versions of whatever system libs they're linked against. Then you have to wait for said company to release a new version whenever the bugs in a system library are fixed. Eventually, we'll manage to do what Windows does, and that is have readily backwards compatable libs that actually work properly.

    For now, conform and produce working results.

  • by Per Abrahamsen (1397) on Wednesday February 14, 2001 @03:16AM (#434129) Homepage
    ...or rather, it is only relevant for C++ libraries, the C ABI has been stable for a long time.

    So has the glibc ABI actually, except that it is not 100% bug compatible. I.e., applications that relies on bugs in the library in order to work, may break when new the library is updated.
  • by jd (1658) <imipak&yahoo,com> on Wednesday February 14, 2001 @03:50AM (#434130) Homepage Journal
    Everything the FSF produces (with one notable exception) follows the philosophy that "small is beautiful" and that N reusable components will always beat 1 system with N features.

    GLibC doesn't do this. Everything's crammed in. And that is bound to make for problems.

    IMHO, what GLibC needs to be is a skeleton library with a well-defined API but no innards. The innards would be in seperate, self-contained libraries, elsewhere on the system.

    This would mean that upgrading the innards should not impact any application, because the application just sees the exoskeleton, and the API would be well-defined. The addition of new innard components would then -extend- the API, but all existing code would be guaranteed to still work, without relinking against some compat library.

  • by LinuxGeek (6139) <djand,nc&gmail,com> on Wednesday February 14, 2001 @06:03AM (#434131)
    Read the whole message that you responded to and he explains the problem *and* the fix that is possible on Unix type systems. I can have five apps of various vintages that each require a different set of libraries that they linked against.

    Like so:
    exec /usr/$sysname-glibc20-linux/lib/ld-linux.so.2 \ --library-path /usr/$sysname-glibc20-linux/lib \ $netscape $defs $cl_opt "$@" ( real world example )

    With names like:
    libORBit.a
    libORBit.la*
    libORBit.so@
    libORBit.so.0@
    libORBit.so.0.5.1*
    libORBit.so.0.5.6*

    We can keep different versions of files for use, not like the different versions of mfc42.dll that all have the same name. If another version of a library is completely backwards compatible, then a simple symbolic link gives the complete name that the run-time linker is looking for.
  • by barries (15577) on Wednesday February 14, 2001 @03:24AM (#434132) Homepage
  • by Dr. Tom (23206) <tomh@nih.gov> on Wednesday February 14, 2001 @02:12AM (#434133) Homepage
    tell your vendor to link it static (using .a libraries instead of .so).

    also remind them that a "Linux" version is
    meaningless, they should say "Linux/x86" or
    "Linux/Alpha" or whatever.

    I hate it when a vendor supplies a "Linux" version
    that won't work on my hardware, and I can't tell until *after* I've downloaded it.
  • by ajs (35943) <ajs&ajs,com> on Wednesday February 14, 2001 @06:36AM (#434134) Homepage Journal
    Getting tired of getting a copy of Oracle for Solaris 2.3, iPlanet for SunOS 4.1.3 and Veritas for Solaris 7 and finding that none of them support my Solaris 8 system. Dammit, what is Sun doing wrong!?

    You'd think that you would actually have to pick an OS revision based on the least-common denominator of the supported platforms for your application needs!

    Someone needs to go write a Python-based OS and then never change anything. That'll solve it.

    ;-) for those who did not guess....
  • by devphil (51341) on Wednesday February 14, 2001 @07:59AM (#434135) Homepage


    ...but nobody reads the documentation anymore, so they bring problems on themselves.

    1.17. What is symbol versioning good for? Do I need it?


    {AJ} Symbol versioning solves problems that are related to interface changes. One version of an interface might have been introduced in a previous version of the GNU C library but the interface or the semantics of the function has been changed in the meantime. For binary compatibility with the old library, a newer library needs to still have the old interface for old programs. On the other hand, new programs should use the new interface. Symbol versioning is the solution for this problem. The GNU libc version 2.1 uses symbol versioning by default if the installed binutils supports it.

    We don't advise building without symbol versioning, since you lose binary compatibility - forever! The binary compatibility you lose is not only against the previous version of the GNU libc (version 2.0) but also against all future versions.

    Using private interfaces, using static libraries, not using versioning even when shared libraries are in use... I'm not surprised Oracle had problems.

  • by blakestah (91866) <blakestah@gmail.com> on Wednesday February 14, 2001 @05:44AM (#434136) Homepage
    For instance take Oracle Applications, it is nearly impossible to install it on RedHat 7.0 or any glibc 2.2 based distro since the applications were built against 2.1.x. When you install this software it tries to relink itself with the correct libraries and fails miserably.

    If there are substantial glibc 2.1-> 2.2 problems it is really poor coding on the part of the vendors. The use of private (but available) glibc functions was made impossible in the changeover.

    There are a few models that will work in this case. First, the older version of glibc can be included with Oracle, and set LD_LIBRARY_PATH or LD_PRELOAD to load those libraries first. Then there is no problem.

    Talk to your vendor. Ultimately, if you want to pay to use their software, they have a responsibility to ensure you can use it with some ease.

  • by Temporal (96070) on Wednesday February 14, 2001 @09:44AM (#434137) Journal

    But it comes at a high price - unexpandability. You can not add a field to a datastructure, since that makes the struct bigger, and breaks compatibility.

    Nonsense. The size of a data structure is an implementation issue, and should never be exposed to the user anyway. Access to such data structures should only be granted through opaque pointers and accessor functions. The user should not be allowed to allocate such a structure manually; they should be forced to call a library routine which does the work for them. Any library which exposes the size of any significant structure as part of its interface is poorly designed.

    Unfortunately, C does not do much to encourage data insulation. Object oriented programming, and especially abstract classes, would be a great help in alleviating these sorts of problems. Read "Large Scale C++ Software Design" by John Lakos for some extensive discussion of insulation (the process of making an interface binary compatible without hindering implementation extensions).

    And - try to search for 100% binary compatibility between say Windows 95 and Windows NT 4.0. Have fuN!

    NT and 95 have much better binary compatibility than different Linux distributions. Windows 2000 has almost perfect binary compatibility with NT and 9x. I find that I can write a program once in Windows and have it work on all my friends' computers without trouble. With Linux, on the other hand, I would never even try to distribute a compiled binary. Source code only. Of course, that is not a problem for me, since all of my code is GPL or LGPL.

    The thing is, the Windows API passes everything around as handles. Handles are opaque pointers, meaning that the caller has no idea what sort of structure they point to. I suspect that a thread handle in 98 points to a data structure that bears no relation to a thread handle in NT, but does that cause binary compatibility problems? No, because Microsoft correctly used opaque pointers.

    Now, most of POSIX and ANSI-C do similar things (FILE*, DIR*, etc.). I am not saying that I particularly like the Windows API (I don't). All I wanted to point out is that it is possible to make a library that can be extended without breaking binary compatibility.

    One last point: Binary compatibility is useful for 100% open source systems! What if a critical bug is found in an older version of glibc, forcing you to upgrade? From the sounds of it, you would have to re-compile every program on your system to make it work! I don't want to do that! With a properly written library, a new version could be dropped in without disturbing anything. Hell, you might not even have to reboot.

    ------

  • by The Pim (140414) on Wednesday February 14, 2001 @04:21AM (#434138)
    Everything the FSF produces (with one notable exception) follows the philosophy that "small is beautiful"

    Oh my God would any UNIX old-timer laugh at that! First, you seem to be claiming that the only exception to this "rule" is GNU libc. Ever heard of EMACS? It's the absolute antithesis of "small is beautiful"! Second, even though GNU has reproduced most of the tools that gave UNIX its minimalist slant, in almost all cases, they extended them to be much larger and more featureful than the originals. Go install FreeBSD sometime, take a sampling of programs, and compare binary sizes and manpages. tar(1) will provide an instructive example.

    I'm not saying this is bad--I mostly like the GNU environment. But compared to real UNIX, it's heavy.

  • by Alatar (227876) on Wednesday February 14, 2001 @02:15AM (#434139) Homepage
    When using real software like Oracle under linux, you find out what the requirements are for the application you're going to run, and install a compatible setup. You don't just run out to the ftp site, burn a copy of the latest distro of Mandrake, and expect every application you install onto the new system to work flawlessly. Maybe you can run something like apache on every machine everywhere, but Big Important things like Oracle generally have pretty specific system requirements, even under other unicies.
  • by q000921 (235076) on Wednesday February 14, 2001 @05:12AM (#434140)
    Well, there are several related issues, and I probably didn't explain the differences well enough in such a short space. Dynamic languages avoid this problem, but I didn't mean to imply that statically typed languages can't also avoid it.

    Java, for example, couples libraries and user code much less tightly, yet uses statically type checked interfaces. Java's type checking is actually unnecessarily strict: classes are considered incompatible on dynamic linking even though only some aspects of their implementation changed. ML implementations could easily do the same thing.

    Also, the fact that languages like C++ and Java tie inheritance hierarchies to static type checking is an unnecessary and idiosyncratic restriction. You can have perfectly statically type-safe systems that do not have these kinds of inheritance constraints: as long as the compiler and/or linker determines that the aspects of the interfaces you are relying on are type-compatible, it can make the two ends fit together safely, no matter what other changes or additions have happened to the classes. The "signature" extension for GNU C++ did this at compile time, and something similar could be done by the dynamic linker when libraries are loaded.

    The efficiency issue is not significant. Even for a completely dynamic object system like Objective-C, a good runtime will have hardly more overhead for a dynamic method call than a regular function call. Any of the systems based on static type checking I mentioned above would do even better. And Java, of course, can actually do better than C/C++ when it comes to libraries because the Java runtime can (and does) inline library code as native code at load/execution time.

    Of course, sometimes, things just have to change incompatibly. But as far as I can tell, almost none of the changes in glibc (or most other C/C++ libraries I use regularly) should affect any user code. Almost any kind of library interface would be less problematic than what exists right now.

    So, I agree: statically typed languages will not go away. But "DLL hell" is avoidable whether you use statically or dynamically typed languages. In fact, as I mentioned, you could even make it go away in C/C++ by introducing a special library calling convention that has a bit more information available at load time. However, why beat a dead horse?

  • by Oestergaard (3005) on Wednesday February 14, 2001 @02:43AM (#434141) Homepage
    I work for a company building a network montioring system available for FreeBSD, NT (and 2K), and both RedHat and Debian Linux. We're adding platforms as people request them.

    Really, RedHat 7.0 includes the libraries that shipped with 6.2, so while we only support RedHat 6.2 we still work out-of-the-box on RedHat 7.0. Why not use the compatibility libraries ? That's what they're there for - they're not performing worse or anything, they are just older versions of the library.

    On UNIX-like systems you actually have VERSIONING on your system libraries. So you can have a perfectly running system with ten different versions of the C library, and each application will use the version it requires.

    You're welcome to check out our beta-versions available from sysorb.com [sysorb.com], if you don't believe me :)
  • by Oestergaard (3005) on Wednesday February 14, 2001 @02:58AM (#434142) Homepage
    It is not usually an option for a vendor to link statically because of license restrictions.

    However, a vendor is allowed to ship a specific version of glibc and libstdc++ with the software, as long as they provide some reasonable access to the source code as well.

    As posted somewhere else, that is what we ended up doing for the RedHat 6.2 port of our network monitoring software [sysorb.com]. We ship a version of libstdc++ that matches our binary, it is installed without interfering with the other versions of libstdc++ that may be installed on the system, and everyone's happy.

    Really, I am surprised how well this stuff works, and I cannot understand why so many people keep complaining about how horrible the system is. I think it's brilliant. And programs can still share the shared libraries, it's not like the Win2K way of doing things, where each app ships it's own set of so-called "shared libraries".

  • by BitMan (15055) on Wednesday February 14, 2001 @09:31AM (#434143) Homepage

    First off, If you get anything out of this post it is this: DO NOT RUN A REDHAT X.0 RELEASE IF YOU DON'T UNDERSTAND LIBRARY VERSIONING

    NO SUCH THING AS "GLIBC HELL"

    There is _no_such_thing_ as "glibc Hell". UNIX (including Linux) has versioning on libraries -- right down tot he filename. _Unlike_ Windows, you can have _multiple_library_versions_ installed. Even Microsoft still has NOT addressed this (and I run into it daily) simply by versioning filenames of libraries. This is a _farce_ and the result of people not understanding the OS in front of them.

    UNIX v. WINDOWS ON SYSTEM LIBRARIES

    So, in a nutshell, you can have your library issues two ways:

    • UNIX: Versioning on system libraries, which gives you 2 options:
      1. Recompile for new libraries (if you have source)
      2. Install older libraries (especially if you don't have source)
    • WINDOWS: Only one system library can be installed, which gives you only 1 option:
      1. Recompile for new libraries (if you have source)
      Otherwise, no option if you don't have source -- total SOL!
      Especially on Win2K which makes some libraries untouchable (and quite incompatible with a lot of existing software).
      Microsoft does this for "stability", but it is a library- ignorant way of NOT addressing the _real_ issue, lib versioning.

    LIBRARY VERSIONING AND SYMBOLIC LINKS

    The main reason Windows cannot have UNIX-like library versioning and versioning on filenames is because it lacks symbolic links. With symbolic links, you can have multiple subrevision of a library, with one subrevision the "default" revision, with that (or another) the default "main" version. E.g.,:

    libmy.so -> libmy.so.1
    libmy.so.1 -> libmy.so.1.1
    libmy.so.1.0 -> libmy.so.1.0.7
    libmy.so.1.0.7
    libmy.so.1.1 -> libmy.so.1.1.3
    libmy.so.1.1.2
    libmy.so.1.1.3
    libmy.so.3 -> libmy.so.3.0
    libmy.so.3.0 -> libmy.so.3.0.1
    libmy.so.3.0.1

    In the preceding example, there are actually only 4 library versions: 1.0.7, 1.1.2, 1.1.3 and 3.0.1. We could easily introduce more versions if programs required them. Most libraries are "parent revisioned" (I don't know what the "official term" is, but that's what I'll call it), so the latest "x.y.z" "version.revision.subrevision" is synlinked as "x.y" "version.revision" as well as "x" "version". As far as compatility between versions, anything goes (and is a per-library consideration), but the "general rule" is as follows:

    Most OSS projects, including GLibC, have good versioning schemes that change subrevisions (the "Z" in x.y.z when updates, bugfixes, or non-structural changes are made -- meaning 1.1.2 and 1.1.3 are most likely header/function compatible. So, depending on the library, most programs are fine when linking against X.Y instead of x.y.z-- and do so to keep from requiring the user to have numerous differnet libraries installed. A simple x.y symlink to the latest x.y.z (latest being the max(Z)) is usually all it takes. Again, a "parent revision" symlink does the job.

    Now different revisions (the "Y" in x.y.z) usually involve some header/function changes that _may_ be INcompatible. As such vendors usually do not link against just the major version (the "X" in x.y.z) for that reason. E.g., some programs work fine on any GLibC 2.y.z system (BTW, GLibC 2 is aka LibC 6), but most are tied to GLibC 2.0.z (RedHat 5.x), GLibC 2.1.z (RedHat 6.x) or GLibC 2.2.z (RedHat 7.x). Major version changes (again, the "X" in x.y.z) are left for radical, completely incompatible changes -- like LibC 4 (RedHat 3.x), LibC 5 (RedHat 4.x) and GLibC 2 (RedHat 5.x+ -- which caused a bigger stir than 7.0 awhile back ;-PPP).

    DON'T RUN A REDHAT X.0 RELEASE UNLESS YOU KNOW WHAT YOU ARE DOING!

    98% of the "bitching and moaning" about RedHat 7.0 comes from user naivity on library versioning. Yes, I _do_agree_ that RedHat did release 7.0 too earlier with unfinished components, but since patching GLibC and GCC through December, RedHat 7.0 is a _solid_ release. Never, never adopt a RedHat X.0 release unless you are a seasoned Linux user! Please get the word out on that (although the RedHat 7.0 README on CD 1 *DOES* stress that point too!!!)

    [ Side note: The kernel is another matter though -- but understand RedHat cannot "time" the release of GLibC, GCC and the kernel since they are all independent development teams. ]

    RedHat gives you _full_warning_ of the all-important GLibC change in a new release. It's always been RedHat's model -- introduce a new, and possibly INcompatible GLibC on a X.0 release. All revisions in a major release have the same GLibC and GCC, and are quite interchangable. I've said it before and I'll say it again, only RedHat seems to do this (although I haven't checked out Debian yet). So I know that going from 6.2 to 7.0 means issues, just like 4.2 to 5.0 did for me almost a half-decade ago. NOTHING NEW!

    So, again, if you do NOT know what you are doing, stick with RedHat X.2 releases (or at least X.1)!!! At X.0 releases, most of the Linux world has NOT yet adopted the new GLibC versions -- hence the wealth of binary library incompatibilities. So if you are not familiar with how to deal with them, do NOT try to deal with a RedHat X.0 release!

    REDHAT MAINTAINS LIBRARY COMPATIBILITY

    Today's RedHat 7.0 (GLibC 2.2) release comes with full RedHat 6.x (GLibC 2.1) compat libs, even a compat devel libs and a compat compiler/linker. And you can also install compat libs from the RedHat 6.2 release for 4.x (LibC 5) and 5.x (GLibC 2.0) compatibility.

    [ Side note: I *DO* have a "complaint" with RedHat for not including LibC5 and GLibC2.0 compatibility libraries with RedHat 7.0. And I've let Alan Cox know about them. They should _at_least_ be available on the Powertools CD. I know LibC4 shouldn't be included for security reasons, but LibC5 and, especially, GLibC2.0 should be! ]

    ANY LIBRARY ISSUES ARE ALMOST ALWAYS THE PACKAGERS FAULT!

    Now that we have the user out of the way, as long as a vendor/packager dynamically links against the specific library version, the binary will use that version. Too many vendors are used to just linking against what they have, especially if it is an older library on their development system. So when a user has a new version of the library, possible with function name, parameter and other structrual changes, core dumps will occur.

    If a developer is really worried, there is always the option of statically linking -- i.e. putting the library in the binary itself (so no external library references/dependencies). Of course, there could be licensing issues with GPL or other OSS code to/from commerical and vice-versa. If people think this is limited to Linux, they are _gravely_mistaken_ as most commercial IDE and development tools introduce their own issues. Under Windows, where only one system library can be installed (and that library may be "fixed" in Win2K), you're in for a world of hurt trying to sort them out.

    END-USER LIBRARY VERSION ADMINISTRATION

    As previously discussed, symlinks are often used to "parent revision" a full "x.y.z" "version.revision.subrevision". Most of the time, RPMS and tarballs/makes installs do this for you. But sometimes, you'll have to administer and create them yourself. Again, doing this is usually easy (it's just a symlink -- "ln -s ") and the first thing to try when a program cannot find a library (e.g., libmy-2.0) where a library exists with a more complete version (e.g., libmy-2.0.1).

    -- Bryan "TheBS" Smith

  • by adubey (82183) on Wednesday February 14, 2001 @03:23AM (#434144)
    You have some interesting viewpoints, but I think you're avoiding the question rather than dealing with it.

    In the programming language research community, the feeling is that dynamic languages are very good for things like scripting and prototyping, but are not as good an idea for large software systems.

    The problem is twofold - first, as you mention, dynamic languages always get a performance hit. But the second reason - which you miss - might be more important - fewer errors can be detected at compile time... they would only turn up at runtime, or worse, end up as hard to detect bugs. Moreover, the runtime may fail in someplace other than where the error occured. For example, let's say you have a bunch of "polygon" objects in a linked list, and you mistakenly put a "circle" object in that list as well. Much later, you're traversing the list and expect to find a polygons, but instead you find a circle. Type error! But the real error was where you put the circle in the linked list. In a dynamically typed language, you'd have to look to see where the circle was inserted - and the bigger the software system, the harder that becomes. However, in a statically typed language, the compiler tells you right away "hey buddy, you're putting a circle in to polygon list. Fix that, or you don't get object code!".

    I don't think that statically typed languages are going to go away. As it often is with issues with software development, the real problem is psychological rather than technological. If backwards compatibility across ".x" releases was a priority for the glibc team, perhaps we wouldn't have this problem. As it is, they are probably more driven to adding new features or fixing really bad old problems in ways which break compatibility... if there are people willing to work on the project who have different goals perhaps it may be time to fork libc again?
  • by The Pim (140414) on Wednesday February 14, 2001 @04:51AM (#434145)
    The glibc (and gcc) developers are so careful about binary backwards compatibility, it's not even funny. If you feel like getting thoroughly flamed by folks much smarter than the slashdot crowd, go suggest an incompatible change on the glibc mailing list (and if you're not such a masochist, read the list archives).

    However, they offer clear conditions. First, they don't guarentee upwards-compatibility, that is code compiled against glibc 2.2 working with 2.1. Second, C++ is currently off limits (which will change with gcc 3.0). Third, it applies only to shared versions of the library. Fourth, private internal interfaces are off limits.

    The Oracle problem is simple: they're using static libraries (ie, ar archives of object files). This doesn't work because symbol versioning (the magic that enables compatibility in shared libraries) isn't implemented for object files. HJ Lu has a page [valinux.com] on this issue and possible resolutions.

    90% of other compatibility problems result from using private interfaces. This happened to Star Office a while back.

  • by ColdGrits (204506) on Wednesday February 14, 2001 @02:58AM (#434146)
    "And - try to search for 100% binary compatibility between say Windows 95 and Windows NT 4.0"

    Bzzzzzzt! Wrong answer, thanks for playing.

    Here's a clue for you - Win95 and NT4 are two TOTALLY SEPERATE PRODUCTS from seperate code bases, whereas glibc is glibc - the same (ha ha!) library, just different versions.

    Of course, what you OUGHT to have written was try to search for 100% binary compatibility between say Windows 95 and Windows 98 or try to search for 100% binary compatibility between say Windows 2000 and Windows NT 4.0 which is extremely easy to do. But then why let trivial things like facts get in the way of a good troll, eh? :(

    --

  • by q000921 (235076) on Wednesday February 14, 2001 @02:33AM (#434147)
    When you pass arguments or structures across the C ABI, each side has a lot of detailed, intricate knowledge of the layouts and sizes of data structures and other details. That means that even fairly minor changes, like adding another field to a structure, may mean that everything needs to be recompiled. Having that kind of detailed knowledge has efficiency advantages, but you pay a serious price in terms of software configuration problems. In the days of the PDP-11, it may have been worth making that tradeoff for most function calls, in the days of 1GHz P4's, it probably isn't except in rare cases.

    Are there alternatives? Plenty, actually:

    • COM was an attempt to address some of these issues in a C++ framework. Unfortunately, the road to hell is paved with good intentions. Trying to retrofit this infrastructure on top of C++ leaves you with a bad kludge on top of an already cumbersome object system.
    • Dynamic languages like Python, CommonLisp, Smalltalk, etc. generally don't suffer from this problem: as long as the objects you are passing around to roughly the right thing, it usually doesn't matter what you change behind the scenes: the code will still "link" and run.
    • This problem could have been addressed easily without straying much from traditional C if people had adopted Objective-C. Objective-C is a minimalistic extension of C that adds just these kinds of "flexible" and "loosely coupled" interfaces to the C language.
    • Java is halfway there: there are a lot more kinds of changes and upgrades you can do to libraries than in C, but it isn't quite as flexible as more dynamic languages.

    You could probably invent a new calling convention for C together with some changes to the dynamic linker that would address this problem for C libraries. While you are at it, you should probably also define a new ABI for C++, something that avoids "vtbl hell" using an approach to method invocation similar to Objective-C. These new calling conventions would be optional, so that you can pick one or the other, depending on whether you are calling within the same module or between different modules. Perhaps that's worth it given how much C/C++ code is out there, but it sure would be a lot of work to try and retrofit those languages. Why not use one of the dozens of languages that fix not just this problem but others as well?

    A related approach is to still write a lot of stuff in C/C++ but wrap it up in a dynamic language and handle most of the library interactions through that. That was the dream of Tcl/Tk (too bad that the language itself had some limitations).

    Altogether, I think the time to fix this in C/C++ has passed, and COM-like approaches didn't work out. My recommendation would be: write your code in a language that suffers less from these problems, Python and Java are my preference, and add C/C++ code to those when needed for efficiency or system calls.

If a camel is a horse designed by a committee, then a consensus forecast is a camel's behind. -- Edgar R. Fiedler

Working...