Floating Point Quality: Less Floaty, More Pointed

Years ago I sat next to the Numerics Test Team at Apple Computer. I teased them one day about how they had it easy: no user interface to worry about; a stateless world; perfectly predictable outcomes. The test lead just heaved a sigh and launched into a rant about how numerics testing is actually rather complicated and brimming with unexpected ambiguities. Apparently, there are many ways to interpret the IEEE floating point standard and learned people are not in agreement about how to do it. Implementing floating point arithmetic on a digital platform is a matter of tradeoffs between accuracy and performance. And don’t get them started about HP… apparently HP calculators had certain calculation bugs that the scientific community had grown used to. So the Apple guys had to duplicate the bugs in order to be considered “correct.”

Among the reasons why floating point is a problem for digital systems is that digital arithmetic is discrete and finite, whereas real numbers often are not. As my colleague Alan Jorgensen says “This problem arises because computers do not represent some real numbers accurately. Just as we need a special notation to record one divided by three as a decimal fraction: 0.33333…., computers do not accurately represent one divided by ten. This has caused serious financial problems and, in at least one documented instance, death.”

Anyway, Alan just patented a process that addresses this problem “by computing two limits (bounds) containing the represented real number that are carried through successive calculations.  When the result is no longer sufficiently accurate the result is so marked, as are further calculations using that value.  It is fail-safe and performs in real time.  It can operate in conjunction with existing hardware and software.  Conversion between existing standardized floating point and this new bounded floating point format are simple operations.”

If you are working with systems that must do extremely accurate and safe floating point calculations, you might want to check out the patent.

10 thoughts on “Floating Point Quality: Less Floaty, More Pointed

  1. I have worked on some systems that used floating point numbers when the numbers really mattered. In our case it was dollars.

    Our best trick was to use integers and printf in a way that inserted a period between the 1000’s and 100’s place.

    • Matt,
      They used that method to compute time in the patriot missile system keeping track of time as an integer in tenths of a second. The problem came when they converted that value to floating point for positioning calculations after accumulating time for 100 hours. The result had insufficient precision to accurately place the scud missile and the patriot missile missed.

      • Why it was necessary to convert it to float in the first place? Would it eliminate the issue if they used only decimal, as missile most-likely have it’s precision somewhere about 0.1 m anyway?

        I see a great value in the suggested approach anyway but also interested in alternatives.

  2. The Solution that I used in my IBM PL/1 and Cobol days was Fixed Decimal For currency in particular. With intermediate results held to an increased precision. So a dollar value might be packed decimal 5.2 5 digits before the decimal point and 2 after but If I was multiplying or dividing two PD numbers of that format The intermediate result was held as 10.4 adding the precisions together. and then convert “safely” according to accounting rules to the lower precision. This worked easily in Cobol but was harder to construct in PL/1 as the compiler was liable to create its own intermediate result. And particular care always had to be taken around dividing a small number by a large number to ensure correctness.

    Inventory management was interesting where pack sizes had to be managed after all if you are filling packs from individual units you cannot have a 0.1 left in the count of individual units. So multiplication and division were all modular arithmetic.

    The other problem was Time. Computers don’t handle time that well at all. As per the example of the fatality. The process control software that I worked on a similar point of time was all based on ticks since midnight. With a super amount of processing to synchronize all the controllers at midnight.

  3. Hi James. The link to the patent images is not loading any images for me within a reasonable amount of time. Tested on Mozilla Firefox version 54.0 (32-bit) (Windows 10 Enterprise 64 bit).
    It does seem to work however on Microsoft Edge.

    Is there any way to notify the authority managing the site of this issue?

    [James’ Reply: Your guess is as good as mine.]

  4. oooops:


    John Gustafson says:
    January 17, 2018 at 6:50 pm

    Absolutely amazing that the US Patent Office would grant a patent for an idea I first publicly presented in 2013, and published in a very well-received book (The End of Error: Unum Arithmetic) in February 2015. All three forms of unum arithmetic are open source and free of patent restrictions (MIT Open Source license). For Jorgensen to claim to be the inventor of this concept is pretty outrageous.

  5. W. Heisenberg & Gustafson if you have a real challenge why not take it up with the US Patient office? It’s free.

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.