Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Math Cloud Virtualization Hardware Science

Ask Slashdot: How Reproducible Is Arithmetic In the Cloud? 226

goodminton writes "I'm research the long-term consistency and reproducibility of math results in the cloud and have questions about floating point calculations. For example, say I create a virtual OS instance on a cloud provider (doesn't matter which one) and install Mathematica to run a precise calculation. Mathematica generates the result based on the combination of software version, operating system, hypervisor, firmware and hardware that are running at that time. In the cloud, hardware, firmware and hypervisors are invisible to the users but could still impact the implementation/operation of floating point math. Say I archive the virutal instance and in 5 or 10 years I fire it up on another cloud provider and run the same calculation. What's the likelihood that the results would be the same? What can be done to adjust for this? Currently, I know people who 'archive' hardware just for the purpose of ensuring reproducibility and I'm wondering how this tranlates to the world of cloud and virtualization across multiple hardware types."
This discussion has been archived. No new comments can be posted.

Ask Slashdot: How Reproducible Is Arithmetic In the Cloud?

Comments Filter:
  • by mkremer ( 66885 ) * on Thursday November 21, 2013 @08:01PM (#45486365)

    Use Fixed-point arithmetic.
    In Mathematica make sure to specify your precision.
    Look at 'Arbitrary-Precision Numbers' and 'Machine-Precision Numbers' for more information on how Mathematica does this.

  • by shutdown -p now ( 807394 ) on Thursday November 21, 2013 @08:17PM (#45486479) Journal

    What the title says - e.g. bignum for Python etc. It will be significantly slower, but the result is going to be stable at least for a given library version, and that is far easier to archive.

  • by Red Jesus ( 962106 ) on Thursday November 21, 2013 @08:18PM (#45486489)

    Mathematica in particular uses adaptive precision; if you ask it to compute some quantity to fifty decimal places, it will do so.

    In general, if you want bit-for-bit reproducible calculations to arbitrary precision, the MPFR [mpfr.org] library may be right for you. It computes correctly-rounded special functions to arbitrary accuracy. If you write a program that calls MPFR routines, then even if your own approximations are not correctly-rounded, they will at least be reproducible.

    If you want to do your calculations to machine precision, you can probably rely on C to behave reproducibly if you do two things: use a compiler flag like -mpc64 on GCC to force the elementary floating point operations (addition, subtraction, multiplication, division, and square root) to behave predictably, and use a correctly-rounded floating point library like crlibm [ens-lyon.fr] (Sun also released a version of this at one point) to make the transcendental functions behave predictably.

  • Re:WTF? (Score:3, Informative)

    by larry bagina ( 561269 ) on Thursday November 21, 2013 @08:25PM (#45486553) Journal

    Let's say you're using C on an x86. float (32-bit) and double (64-bit) are well defined. However, the x86 FPU internally uses long double (80-bit).

    So if you do some math on a float or a double, the results can vary depending on if it was done as 80-bit or if the intermediaries were spilled and truncated back to 64/32 bit.

  • by Giant Electronic Bra ( 1229876 ) on Thursday November 21, 2013 @08:44PM (#45486691)

    Yes, you can do this, but its not feasible for all calculations. Things like trig functions are implemented on FP numbers, and once you start using FP its better to just keep using it, converting back and forth is just bad and defeats the whole purpose anyway. So in reality you end up with applications that DO use FP (believe me, as an old FORTH programmer I can attest to the benefits of scaled integer arithmetic!). Its one of those things, we're stuck with FP and once we assume that, then the whole question of small differences in results of machine-level instructions or of minor differences in libraries on different platforms, etc. you will probably find that arbitrary VMs won't produce exactly identical results when you run on different platforms (AWS, KVM, VMWare, some new thing).

    Is it ia huge problem though? The results produced should be similar, the parameters being varied were never controlled for anyway. Its how often the rounding errors between two FPUs are identical. Neither the new nor the old results should be considered 'better' and they should generally be about the same if the result is robust. A climate sym for example run on two different systems for an ensemble of runs with similar inputs should produce statistically indistinguishable results. If they don't then you should know what the differences are by comparison. In reality I doubt very many experiments will be in doubt based on this.

  • by s.petry ( 762400 ) on Thursday November 21, 2013 @08:52PM (#45486745)

    My first thought on seeing "tranlate" and "I'm research" was that it's only language, but then I read invalid and incorrect statements about how precision is defined in Mathematica. So now I'm not quite sure it's just language.

    Archiving a whole virtual machine as opposed to the code being compiled and run is baffling to me.

    Now if you are trying to archive the machine to run your old version of Mathematica and see if you get the same result, you may want to check your license agreement with Wolfram first. Second, you should be able to export the code and run the same code on new versions.

    I'm really really confused on why you would want this to begin with though. Precision has increased quite a bit with the advent of 64bit hardware. I'd be more interested in taking some theoretical code and changing "double" to "uberlong" and see if I get the same results than what I solved today on today's hardware.

    Unless this is some type of Government work which requires you to maintain the whole system, I simply fail to see any benefit.

    Having "Cloud" does not change how precision works in Math languages.

  • by Joce640k ( 829181 ) on Thursday November 21, 2013 @10:05PM (#45487205) Homepage

    Submitter is entirely ignorant of floating point issues in general. Other than the buzzword "cloud" this is no different from any other clueless question about numerical issues in computing. "Help me, I don't know anything about the problem, but I just realized it exists!"

    Wrong.

    In IEEE floating point math, "(a+b)+c" might not be the same as "a+(b+c)".

    The exact results of a calculation can depend on how a compiler optimized the code. Change the compiler and all bets are off. Different versions of the same software can produce different results.

    If you want the exact same results across all compilers you need to write your own math routines which guarantee the order of evaluation of expressions.

    OTOH, operating system, hardware, firmware and hypervisors shouldn't make any difference if they're running the same code. IEEE math *is* deterministic.

  • by Giant Electronic Bra ( 1229876 ) on Thursday November 21, 2013 @10:40PM (#45487417)

    Trust me, its a subject I've studied. The problem here is that your system is unstable, tiny differences in inputs generate huge differences in output. You cannot simply take one set of inputs that produces what you think is the 'right answer' from that system and ignore all the rest! You have to explore the ensemble behavior of many different sets of inputs, and the overall set of responses of the system is your output, not any one specific run with specific inputs that would produce a totally different result if one was off by a tiny bit.

    Of course Lorenz realized this. Simple experiments with an LDE will show you this kind of result. You simply cannot treat these systems the way you would ones which exhibit function-like behavior (at least within some bounds). Lorenz of course also realized THAT, but sadly not everyone has got the memo yet! lol.

  • False assumption (Score:5, Informative)

    by bertok ( 226922 ) on Thursday November 21, 2013 @11:26PM (#45487699)

    This assumption by the OP:

    Mathematica generates the result based on the combination of software version, operating system, hypervisor, firmware and hardware that are running at that time.

    ... is entirely wrong. One of the defining features of Mathematica is symbolic expression rewriting and arbitrary-precision computation to avoid all of those specific issues. For example, the expression:

    N[Sin[1], 50]

    Will always evaluate to exactly:

    0.84147098480789650665250232163029899962256306079837

    And, as expected, evaluating to 51 digits yields:

    0.841470984807896506652502321630298999622563060798371

    Notice how the last digit in the first case remains unchanged, as expected.

    This is explained at length in the documentation, and also in numerous Wolfram blog articles that go on about the details of the algorithms used to achieve this on a range of processors and operating systems. The (rare) exceptions are marked as such in the help and usually have (slower) arbitrary-precision or symbolic variants. For research purposes, Mathematica comes with an entire bag of tools that can be used to implement numerical algorithms to any precision reliably.

    Conclusion: The author of the post didn't even bother to flip through the manual, despite having strict requirements spanning decades. He does however have the spare time to post on Slashdot and waste everybody else's time.

  • by RightwingNutjob ( 1302813 ) on Thursday November 21, 2013 @11:41PM (#45487761)
    If you want exact results from a fixed number of significant bits, you want magic.

    Whatever calculation you're making, be aware of the dynamic range of the intermediate results. Structure your calculations so that all intermediate results stay well within the dynamic range of the datatype. If you want to compute the standard deviation of 2048x2048 32-bit integers, use a 64 bit or 128 bit integer to compute the intermediate sum(x^2). If you try to use an IEEE double, you'll end up overflowing the 53 bits they give you because 2^11 * 2^11 * 2^32=2^54.

    If you can, reformulate your calculation steps so to minimize the sensitivity to random errors on the order of a machine epsilon.

    An electronic computer manual from UNIVAC/Boroughs/IBM written for pure mathematicians in ~1953 will tell you the same thing.
  • Re:WTF? (Score:4, Informative)

    by gweihir ( 88907 ) on Friday November 22, 2013 @01:18AM (#45488191)

    They do not. IEEE754 has no "grey area". The results must match bit-exact or you are not IEEE754.

    Of course, there can be implementation bugs. For example, Qemu does co-processor emulation only with 64 bit floats instead of the required 80 bit. Nobody seem to really care however. The other thing is of course that if reproducibility is more important than correctness, I suspect the math is done wrong.

  • by tlhIngan ( 30335 ) <[ten.frow] [ta] [todhsals]> on Friday November 22, 2013 @02:09AM (#45488367)

    Don't use floating point if you can avoid it.

    If you can't, and the results are EXTREMELY important (remember, floating point is an APPROXIMATION of numbers), then you have to read What Every Computer Scientist Should Know About Floating Point Numbers [oracle.com]. (Yes, it's an Oracle link, but if you google it, most of the links are PDFs while the Oracle one is HTML).

    If you're worried about your cloud provider screwing with your results, then you're definitely doing it wrong (read that article).

    And yes, lots of people, even scientists, do it wrong because the idealized notion of what a floating point type is and how it actually works in hardware is completely different. Floating point numbers are tricky - they're VERY easy to use, but they're also VERY easy to use wrongly, and it's only if you know how the actual hardware is doing the calculations can you structure your programs and algorithms to do it right.

    And no actual hardware FPU or VPU (vector unit - some do floating point) implements the full IEEE spec. Many come close, but none implement it exactly - there's always an omission or two. Especially since a lot of FPUs provide extended precision that goes beyond IEEE spec.

  • by Chalnoth ( 1334923 ) on Friday November 22, 2013 @02:19AM (#45488397)

    Yup. And if you want to use any kind of parallelism to compute the final result, you're going to have quite a hard time ensuring that the order of operations is always the same.

    That said, there are libraries around that make use of IEEE's reproducibility guarantees to ensure reproducible results. That will likely correct any reproducibility issues that would otherwise be introduced by the compiler, but you still have the order of operations issue (which is a fundamental problem).

    Personally, I think a better solution is to simply assume that you're never going to get reproducible floating-point results, and design the system to handle small, inconsistent rounding errors. I think that's a much easier problem to deal with than making floating-point reproducible in any modestly-complex system.

  • by goodminton ( 825605 ) on Friday November 22, 2013 @03:04AM (#45488529)
    Awesome link! I'm the OP and I really appreciate your response. The reason I'm looking into this is that I work with many scientists who use commercial software packages where they don't control the code or compiler and their results are archived and can be reanalyzed years later. I was recently helping someone revive an old server to perform just such a reanalysis and we had so much trouble getting the machine going again I started planning to clone/virtualize it. That got me thinking about where to put the virtual machine (dedicated hardware, cloud, etc) and it also got me curious about hypervisors. I found some papers indicating that commercial hypervisors can have variability in their floating point math performance and all of that culminated in my post. Thanks again.

There are two ways to write error-free programs; only the third one works.

Working...