## digitalmars.D.learn - float.max + 1.0 does not overflow

- rumbu (8/8) Dec 27 2017 Is that normal?
- Benjamin Thaut (23/31) Dec 27 2017 This is actually correct floating point behavior. Consider the
- Dave Jones (12/25) Dec 28 2017 The float with the lower exponent would have to be shifted to

Is that normal? use std.math; float f = float.max; f += 1.0; assert(IeeeFlags.overflow) //failure assert(f == float.inf) //failure, f is in fact float.max On the contrary, float.max + float.max will overflow. The behavior is the same for double and real.

Dec 27 2017

On Wednesday, 27 December 2017 at 13:40:28 UTC, rumbu wrote:Is that normal? use std.math; float f = float.max; f += 1.0; assert(IeeeFlags.overflow) //failure assert(f == float.inf) //failure, f is in fact float.max On the contrary, float.max + float.max will overflow. The behavior is the same for double and real.This is actually correct floating point behavior. Consider the following program: float nextReprensentableToMax = float.max; // find next smaller representable floating point number (*cast(int*)&nextReprensentableToMax)--; writefln("%f", float.max - nextReprensentableToMax); It computes the difference between float.max and the next smaller reprensentable number in floating point. The difference printed by the program is: 20282409603651670423947251286016.0 As you might notice this is siginificantly bigger then 1.0. Floating point operations work like this: They perform the operation and then round to the nearest representable number in floating point. So adding 1.0 to float.max and then rounding to the nearest representable number will just give you back float.max. If you however add float.max and float.max the next nearest reprensentable number is float.inf. When trying to understand how floating point works I would highly recommend that you read these articles (oldest first): https://randomascii.wordpress.com/category/floating-point/ Kind Regards Benjamin Thaut

Dec 27 2017

On Wednesday, 27 December 2017 at 14:14:42 UTC, Benjamin Thaut wrote:On Wednesday, 27 December 2017 at 13:40:28 UTC, rumbu wrote:The float with the lower exponent would have to be shifted to match the higher which means 1.0 would be shifted something like 156 bits to the right before the addition can be done. If you shift right more bits than are in the mantissa then it get rounded to zero. Hence once the two values are lined up to do the actual op it becomes float.max + 0.0. That said i suspect the OP was expecting the FPU unit to catch that in theory it should overflow. Not that the actual op would overflow but that the FPU would be checking the values on input. Maybe.Is that normal?It computes the difference between float.max and the next smaller reprensentable number in floating point. The difference printed by the program is: 20282409603651670423947251286016.0 As you might notice this is siginificantly bigger then 1.0. Floating point operations work like this: They perform the operation and then round to the nearest representable number in floating point. So adding 1.0 to float.max and then rounding to the nearest representable number will just give you back float.max. If you however add float.max and float.max the next nearest reprensentable number is float.inf.

Dec 28 2017