001/* 002 * Licensed to the Apache Software Foundation (ASF) under one or more 003 * contributor license agreements. See the NOTICE file distributed with 004 * this work for additional information regarding copyright ownership. 005 * The ASF licenses this file to You under the Apache License, Version 2.0 006 * (the "License"); you may not use this file except in compliance with 007 * the License. You may obtain a copy of the License at 008 * 009 * http://www.apache.org/licenses/LICENSE-2.0 010 * 011 * Unless required by applicable law or agreed to in writing, software 012 * distributed under the License is distributed on an "AS IS" BASIS, 013 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 014 * See the License for the specific language governing permissions and 015 * limitations under the License. 016 */ 017package org.apache.commons.numbers.core; 018 019import java.io.Serializable; 020import java.math.BigDecimal; 021import java.util.function.DoubleUnaryOperator; 022 023/** 024 * Computes double-double floating-point operations. 025 * 026 * <p>A double-double is an unevaluated sum of two IEEE double precision numbers capable of 027 * representing at least 106 bits of significand. A normalized double-double number {@code (x, xx)} 028 * satisfies the condition that the parts are non-overlapping in magnitude such that: 029 * <pre> 030 * |x| > |xx| 031 * x == x + xx 032 * </pre> 033 * 034 * <p>This implementation assumes a normalized representation during operations on a {@code DD} 035 * number and computes results as a normalized representation. Any double-double number 036 * can be normalized by summation of the parts (see {@link #ofSum(double, double) ofSum}). 037 * Note that the number {@code (x, xx)} may also be referred to using the labels high and low 038 * to indicate the magnitude of the parts as 039 * {@code (x}<sub>hi</sub>{@code , x}<sub>lo</sub>{@code )}, or using a numerical suffix for the 040 * parts as {@code (x}<sub>0</sub>{@code , x}<sub>1</sub>{@code )}. The numerical suffix is 041 * typically used when the number has an arbitrary number of parts. 042 * 043 * <p>The double-double class is immutable. 044 * 045 * <p><b>Construction</b> 046 * 047 * <p>Factory methods to create a {@code DD} that are exact use the prefix {@code of}. Methods 048 * that create the closest possible representation use the prefix {@code from}. These methods 049 * may suffer a possible loss of precision during conversion. 050 * 051 * <p>Primitive values of type {@code double}, {@code int} and {@code long} are 052 * converted exactly to a {@code DD}. 053 * 054 * <p>The {@code DD} class can also be created as the result of an arithmetic operation on a pair 055 * of {@code double} operands. The resulting {@code DD} has the IEEE754 {@code double} result 056 * of the operation in the first part, and the second part contains the round-off lost from the 057 * operation due to rounding. Construction using add ({@code +}), subtract ({@code -}) and 058 * multiply ({@code *}) operators are exact. Construction using division ({@code /}) may be 059 * inexact if the quotient is not representable. 060 * 061 * <p>Note that it is more efficient to create a {@code DD} from a {@code double} operation than 062 * to create two {@code DD} values and combine them with the same operation. The result will be 063 * the same for add, subtract and multiply but may lose precision for divide. 064 * <pre>{@code 065 * // Inefficient 066 * DD a = DD.of(1.23).add(DD.of(4.56)); 067 * // Optimal 068 * DD b = DD.ofSum(1.23, 4.56); 069 * 070 * // Inefficient and may lose precision 071 * DD c = DD.of(1.23).divide(DD.of(4.56)); 072 * // Optimal 073 * DD d = DD.fromQuotient(1.23, 4.56); 074 * }</pre> 075 * 076 * <p>It is not possible to directly specify the two parts of the number. 077 * The two parts must be added using {@link #ofSum(double, double) ofSum}. 078 * If the two parts already represent a number {@code (x, xx)} such that {@code x == x + xx} 079 * then the magnitudes of the parts will be unchanged; any signed zeros may be subject to a sign 080 * change. 081 * 082 * <p><b>Primitive operands</b> 083 * 084 * <p>Operations are provided using a {@code DD} operand or a {@code double} operand. 085 * Implicit type conversion allows methods with a {@code double} operand to be used 086 * with other primitives such as {@code int} or {@code long}. Note that casting of a {@code long} 087 * to a {@code double} may result in loss of precision. 088 * To maintain the full precision of a {@code long} first convert the value to a {@code DD} using 089 * {@link #of(long)} and use the same arithmetic operation using the {@code DD} operand. 090 * 091 * <p><b>Accuracy</b> 092 * 093 * <p>Add and multiply operations using two {@code double} values operands are computed to an 094 * exact {@code DD} result (see {@link #ofSum(double, double) ofSum} and 095 * {@link #ofProduct(double, double) ofProduct}). Operations involving a {@code DD} and another 096 * operand, either {@code double} or {@code DD}, are not exact. 097 * 098 * <p>This class is not intended to perform exact arithmetic. Arbitrary precision arithmetic is 099 * available using {@link BigDecimal}. Single operations will compute the {@code DD} result within 100 * a tolerance of the 106-bit exact result. This far exceeds the accuracy of {@code double} 101 * arithmetic. The reduced accuracy is a compromise to deliver increased performance. 102 * The class is intended to reduce error in equivalent {@code double} arithmetic operations where 103 * the {@code double} valued result is required to high accuracy. Although it 104 * is possible to reduce error to 2<sup>-106</sup> for all operations, the additional computation 105 * would impact performance and would require multiple chained operations to potentially 106 * observe a different result when the final {@code DD} is converted to a {@code double}. 107 * 108 * <p><b>Canonical representation</b> 109 * 110 * <p>The double-double number is the sum of its parts. The canonical representation of the 111 * number is the explicit value of the parts. The {@link #toString()} method is provided to 112 * convert to a String representation of the parts formatted as a tuple. 113 * 114 * <p>The class implements {@link #equals(Object)} and {@link #hashCode()} and allows usage as 115 * a key in a Set or Map. Equality requires <em>binary</em> equivalence of the parts. Note that 116 * representations of zero using different combinations of +/- 0.0 are not considered equal. 117 * Also note that many non-normalized double-double numbers can represent the same number. 118 * Double-double numbers can be normalized before operations that involve {@link #equals(Object)} 119 * by {@link #ofSum(double, double) adding} the parts; this is exact for a finite sum 120 * and provides equality support for non-zero numbers. Alternatively exact numerical equality 121 * and comparisons are supported by conversion to a {@link #bigDecimalValue() BigDecimal} 122 * representation. Note that {@link BigDecimal} does not support non-finite values. 123 * 124 * <p><b>Overflow, underflow and non-finite support</b> 125 * 126 * <p>A double-double number is limited to the same finite range as a {@code double} 127 * ({@value Double#MIN_VALUE} to {@value Double#MAX_VALUE}). This class is intended for use when 128 * the ultimate result is finite and intermediate values do not approach infinity or zero. 129 * 130 * <p>This implementation does not support IEEE standards for handling infinite and NaN when used 131 * in arithmetic operations. Computations may split a 64-bit double into two parts and/or use 132 * subtraction of intermediate terms to compute round-off parts. These operations may generate 133 * infinite values due to overflow which then propagate through further operations to NaN, 134 * for example computing the round-off using {@code Inf - Inf = NaN}. 135 * 136 * <p>Operations that involve splitting a double (multiply, divide) are safe 137 * when the base 2 exponent is below 996. This puts an upper limit of approximately +/-6.7e299 on 138 * any values to be split; in practice the arguments to multiply and divide operations are further 139 * constrained by the expected finite value of the product or quotient. 140 * 141 * <p>Likewise the smallest value that can be represented is {@link Double#MIN_VALUE}. The full 142 * 106-bit accuracy will be lost when intermediates are within 2<sup>53</sup> of 143 * {@link Double#MIN_NORMAL}. 144 * 145 * <p>The {@code DD} result can be verified by checking it is a {@link #isFinite() finite} 146 * evaluated sum. Computations expecting to approach over or underflow must use scaling of 147 * intermediate terms (see {@link #frexp(int[]) frexp} and {@link #scalb(int) scalb}) and 148 * appropriate management of the current base 2 scale. 149 * 150 * <p>References: 151 * <ol> 152 * <li> 153 * Dekker, T.J. (1971) 154 * <a href="https://doi.org/10.1007/BF01397083"> 155 * A floating-point technique for extending the available precision</a> 156 * Numerische Mathematik, 18:224–242.</li> 157 * <li> 158 * Shewchuk, J.R. (1997) 159 * <a href="https://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps"> 160 * Arbitrary Precision Floating-Point Arithmetic</a>.</li> 161 * <li> 162 * Hide, Y, Li, X.S. and Bailey, D.H. (2008) 163 * <a href="https://www.davidhbailey.com/dhbpapers/qd.pdf"> 164 * Library for Double-Double and Quad-Double Arithmetic</a>.</li> 165 * </ol> 166 * 167 * @since 1.2 168 */ 169public final class DD 170 extends Number 171 implements NativeOperators<DD>, 172 Serializable { 173 // Caveat: 174 // 175 // The code below uses many additions/subtractions that may 176 // appear redundant. However, they should NOT be simplified, as they 177 // do use IEEE754 floating point arithmetic rounding properties. 178 // 179 // Algorithms are based on computing the product or sum of two values x and y in 180 // extended precision. The standard result is stored using a double (high part z) and 181 // the round-off error (or low part zz) is stored in a second double, e.g: 182 // x * y = (z, zz); z + zz = x * y 183 // x + y = (z, zz); z + zz = x + y 184 // 185 // The building blocks for double-double arithmetic are: 186 // 187 // Fast-Two-Sum: Addition of two doubles (ordered |x| > |y|) to a double-double 188 // Two-Sum: Addition of two doubles (unordered) to a double-double 189 // Two-Prod: Multiplication of two doubles to a double-double 190 // 191 // These are used to create functions operating on double and double-double numbers. 192 // 193 // To sum multiple (z, zz) results ideally the parts are sorted in order of 194 // non-decreasing magnitude and summed. This is exact if each number's most significant 195 // bit is below the least significant bit of the next (i.e. does not 196 // overlap). Creating non-overlapping parts requires a rebalancing 197 // of adjacent pairs using a summation z + zz = (z1, zz1) iteratively through the parts 198 // (see Shewchuk (1997) Grow-Expansion and Expansion-Sum [2]). 199 // 200 // Accurate summation of an expansion (more than one double value) to a double-double 201 // performs a two-sum through the expansion e (length m). 202 // The single pass with two-sum ensures that the final term e_m is a good approximation 203 // for e: |e - e_m| < ulp(e_m); and the sum of the parts to 204 // e_(m-1) is within 1 ULP of the round-off ulp(|e - e_m|). 205 // These final two terms create the double-double result using two-sum. 206 // 207 // Maintenance of 1 ULP precision in the round-off component for all double-double 208 // operations is a performance burden. This class avoids this requirement to provide 209 // a compromise between accuracy and performance. 210 211 /** 212 * A double-double number representing one. 213 */ 214 public static final DD ONE = new DD(1, 0); 215 /** 216 * A double-double number representing zero. 217 */ 218 public static final DD ZERO = new DD(0, 0); 219 220 /** 221 * The multiplier used to split the double value into high and low parts. From 222 * Dekker (1971): "The constant should be chosen equal to 2^(p - p/2) + 1, 223 * where p is the number of binary digits in the mantissa". Here p is 53 224 * and the multiplier is {@code 2^27 + 1}. 225 */ 226 private static final double MULTIPLIER = 1.0 + 0x1.0p27; 227 /** The mask to extract the raw 11-bit exponent. 228 * The value must be shifted 52-bits to remove the mantissa bits. */ 229 private static final int EXP_MASK = 0x7ff; 230 /** The value 2046 converted for use if using {@link Integer#compareUnsigned(int, int)}. 231 * This requires adding {@link Integer#MIN_VALUE} to 2046. */ 232 private static final int CMP_UNSIGNED_2046 = Integer.MIN_VALUE + 2046; 233 /** The value -1 converted for use if using {@link Integer#compareUnsigned(int, int)}. 234 * This requires adding {@link Integer#MIN_VALUE} to -1. */ 235 private static final int CMP_UNSIGNED_MINUS_1 = Integer.MIN_VALUE - 1; 236 /** The value 1022 converted for use if using {@link Integer#compareUnsigned(int, int)}. 237 * This requires adding {@link Integer#MIN_VALUE} to 1022. */ 238 private static final int CMP_UNSIGNED_1022 = Integer.MIN_VALUE + 1022; 239 /** 2^512. */ 240 private static final double TWO_POW_512 = 0x1.0p512; 241 /** 2^-512. */ 242 private static final double TWO_POW_M512 = 0x1.0p-512; 243 /** 2^53. Any double with a magnitude above this is an even integer. */ 244 private static final double TWO_POW_53 = 0x1.0p53; 245 /** Mask to extract the high 32-bits from a long. */ 246 private static final long HIGH32_MASK = 0xffff_ffff_0000_0000L; 247 /** Mask to remove the sign bit from a long. */ 248 private static final long UNSIGN_MASK = 0x7fff_ffff_ffff_ffffL; 249 /** Mask to extract the 52-bit mantissa from a long representation of a double. */ 250 private static final long MANTISSA_MASK = 0x000f_ffff_ffff_ffffL; 251 /** Exponent offset in IEEE754 representation. */ 252 private static final int EXPONENT_OFFSET = 1023; 253 /** 0.5. */ 254 private static final double HALF = 0.5; 255 /** The limit for safe multiplication of {@code x*y}, assuming values above 1. 256 * Used to maintain positive values during the power computation. */ 257 private static final double SAFE_MULTIPLY = 0x1.0p500; 258 259 /** 260 * The size of the buffer for {@link #toString()}. 261 * 262 * <p>The longest double will require a sign, a maximum of 17 digits, the decimal place 263 * and the exponent, e.g. for max value this is 24 chars: -1.7976931348623157e+308. 264 * Set the buffer size to twice this and round up to a power of 2 thus 265 * allowing for formatting characters. The size is 64. 266 */ 267 private static final int TO_STRING_SIZE = 64; 268 /** {@link #toString() String representation}. */ 269 private static final char FORMAT_START = '('; 270 /** {@link #toString() String representation}. */ 271 private static final char FORMAT_END = ')'; 272 /** {@link #toString() String representation}. */ 273 private static final char FORMAT_SEP = ','; 274 275 /** Serializable version identifier. */ 276 private static final long serialVersionUID = 20230701L; 277 278 /** The high part of the double-double number. */ 279 private final double x; 280 /** The low part of the double-double number. */ 281 private final double xx; 282 283 /** 284 * Create a double-double number {@code (x, xx)}. 285 * 286 * @param x High part. 287 * @param xx Low part. 288 */ 289 private DD(double x, double xx) { 290 this.x = x; 291 this.xx = xx; 292 } 293 294 // Conversion constructors 295 296 /** 297 * Creates the double-double number as the value {@code (x, 0)}. 298 * 299 * @param x Value. 300 * @return the double-double 301 */ 302 public static DD of(double x) { 303 return new DD(x, 0); 304 } 305 306 /** 307 * Creates the double-double number as the value {@code (x, xx)}. 308 * 309 * <p><strong>Warning</strong> 310 * 311 * <p>The arguments are used directly. No checks are made that they represent 312 * a normalized double-double number: {@code x == x + xx}. 313 * 314 * <p>This method is exposed for testing. 315 * 316 * @param x High part. 317 * @param xx Low part. 318 * @return the double-double 319 * @see #twoSum(double, double) 320 */ 321 static DD of(double x, double xx) { 322 return new DD(x, xx); 323 } 324 325 /** 326 * Creates the double-double number as the value {@code (x, 0)}. 327 * 328 * <p>Note this method exists to avoid using {@link #of(long)} for {@code int} 329 * arguments; the {@code long} variation is slower as it preserves all 64-bits 330 * of information. 331 * 332 * @param x Value. 333 * @return the double-double 334 * @see #of(long) 335 */ 336 public static DD of(int x) { 337 return new DD(x, 0); 338 } 339 340 /** 341 * Creates the double-double number with the high part equal to {@code (double) x} 342 * and the low part equal to any remaining bits. 343 * 344 * <p>Note this method preserves all 64-bits of precision. Faster construction can be 345 * achieved using up to 53-bits of precision using {@code of((double) x)}. 346 * 347 * @param x Value. 348 * @return the double-double 349 * @see #of(double) 350 */ 351 public static DD of(long x) { 352 // Note: Casting the long to a double can lose bits due to rounding. 353 // These are not recoverable using lo = x - (long)((double) x) 354 // if the double is rounded outside the range of a long (i.e. 2^53). 355 // Split the long into two 32-bit numbers that are exactly representable 356 // and add them. 357 final long a = x & HIGH32_MASK; 358 final long b = x - a; 359 // When x is positive: a > b or a == 0 360 // When x is negative: |a| > |b| 361 return fastTwoSum(a, b); 362 } 363 364 /** 365 * Creates the double-double number using the argument {@code x} as an unsigned 366 * 32-bit integer. Equivalent to: 367 * 368 * <pre> 369 * DD.of((double) Integer.toUnsignedLong(x)) 370 * </pre> 371 * 372 * <p>Note this method exists to avoid using {@link #ofUnsigned(long)} for {@code int} 373 * arguments as sign extension on negative values will generate an incorrect result. 374 * 375 * @param x Value. 376 * @return the double-double 377 * @since 1.3 378 */ 379 public static DD ofUnsigned(int x) { 380 return new DD(Integer.toUnsignedLong(x), 0); 381 } 382 383 /** 384 * Creates the double-double number using the argument {@code x} as an unsigned 64-bit 385 * integer. 386 * 387 * <p>Zero and positive values are mapped to a numerically equal double-double value; 388 * and negative values are mapped to a value equal to the positive {@code long} value 389 * of the low order 63-bits plus 2<sup>63</sup>. 390 * 391 * <p>Note this method preserves all 64-bits of precision. It can be used to represent 392 * the positive difference between two ordered {@code long} values without using 393 * {@link DD#subtract(DD)}. 394 * 395 * <p>Given two {@code long} values {@code a <= b} the following are equivalent: 396 * 397 * <pre> 398 * DD.of(b).subtract(DD.of(a)) 399 * 400 * DD.ofUnsigned(b - a) 401 * </pre> 402 * 403 * @param x Value. 404 * @return the double-double 405 * @since 1.3 406 */ 407 public static DD ofUnsigned(long x) { 408 // Similar to of(long) but the high part is composed as an unsigned double 409 final long a = x & HIGH32_MASK; 410 final long b = x - a; 411 // Convert the unsigned magnitude to a double 412 final double aa = (a >>> 1) * 2.0; 413 return fastTwoSum(aa, b); 414 } 415 416 /** 417 * Creates the double-double number {@code (z, zz)} using the {@code double} representation 418 * of the argument {@code x}; the low part is the {@code double} representation of the 419 * round-off error. 420 * <pre> 421 * double z = x.doubleValue(); 422 * double zz = x.subtract(new BigDecimal(z)).doubleValue(); 423 * </pre> 424 * <p>If the value cannot be represented as a finite value the result will have an 425 * infinite high part and the low part is undefined. 426 * 427 * <p>Note: This conversion can lose information about the precision of the BigDecimal value. 428 * The result is the closest double-double representation to the value. 429 * 430 * @param x Value. 431 * @return the double-double 432 */ 433 public static DD from(BigDecimal x) { 434 final double z = x.doubleValue(); 435 // Guard against an infinite throwing a exception 436 final double zz = Double.isInfinite(z) ? 0 : x.subtract(new BigDecimal(z)).doubleValue(); 437 // No normalisation here 438 return new DD(z, zz); 439 } 440 441 // Arithmetic constructors: 442 443 /** 444 * Returns a {@code DD} whose value is {@code (x + y)}. 445 * The values are not required to be ordered by magnitude, 446 * i.e. the result is commutative: {@code x + y == y + x}. 447 * 448 * <p>This method ignores special handling of non-normal numbers and 449 * overflow within the extended precision computation. 450 * This creates the following special cases: 451 * 452 * <ul> 453 * <li>If {@code x + y} is infinite then the low part is NaN.</li> 454 * <li>If {@code x} or {@code y} is infinite or NaN then the low part is NaN.</li> 455 * <li>If {@code x + y} is sub-normal or zero then the low part is +/-0.0.</li> 456 * </ul> 457 * 458 * <p>An invalid result can be identified using {@link #isFinite()}. 459 * 460 * <p>The result is the exact double-double representation of the sum. 461 * 462 * @param x Addend. 463 * @param y Addend. 464 * @return the sum {@code x + y}. 465 * @see #ofDifference(double, double) 466 */ 467 public static DD ofSum(double x, double y) { 468 return twoSum(x, y); 469 } 470 471 /** 472 * Returns a {@code DD} whose value is {@code (x - y)}. 473 * The values are not required to be ordered by magnitude, 474 * i.e. the result matches a negation and addition: {@code x - y == -y + x}. 475 * 476 * <p>Computes the same results as {@link #ofSum(double, double) ofSum(a, -b)}. 477 * See that method for details of special cases. 478 * 479 * <p>An invalid result can be identified using {@link #isFinite()}. 480 * 481 * <p>The result is the exact double-double representation of the difference. 482 * 483 * @param x Minuend. 484 * @param y Subtrahend. 485 * @return {@code x - y}. 486 * @see #ofSum(double, double) 487 */ 488 public static DD ofDifference(double x, double y) { 489 return twoDiff(x, y); 490 } 491 492 /** 493 * Returns a {@code DD} whose value is {@code (x * y)}. 494 * 495 * <p>This method ignores special handling of non-normal numbers and intermediate 496 * overflow within the extended precision computation. 497 * This creates the following special cases: 498 * 499 * <ul> 500 * <li>If either {@code |x|} or {@code |y|} multiplied by {@code 1 + 2^27} 501 * is infinite (intermediate overflow) then the low part is NaN.</li> 502 * <li>If {@code x * y} is infinite then the low part is NaN.</li> 503 * <li>If {@code x} or {@code y} is infinite or NaN then the low part is NaN.</li> 504 * <li>If {@code x * y} is sub-normal or zero then the low part is +/-0.0.</li> 505 * </ul> 506 * 507 * <p>An invalid result can be identified using {@link #isFinite()}. 508 * 509 * <p>Note: Ignoring special cases is a design choice for performance. The 510 * method is therefore not a drop-in replacement for 511 * {@code roundOff = Math.fma(x, y, -x * y)}. 512 * 513 * <p>The result is the exact double-double representation of the product. 514 * 515 * @param x Factor. 516 * @param y Factor. 517 * @return the product {@code x * y}. 518 */ 519 public static DD ofProduct(double x, double y) { 520 return twoProd(x, y); 521 } 522 523 /** 524 * Returns a {@code DD} whose value is {@code (x * x)}. 525 * 526 * <p>This method is an optimisation of {@link #ofProduct(double, double) multiply(x, x)}. 527 * See that method for details of special cases. 528 * 529 * <p>An invalid result can be identified using {@link #isFinite()}. 530 * 531 * <p>The result is the exact double-double representation of the square. 532 * 533 * @param x Factor. 534 * @return the square {@code x * x}. 535 * @see #ofProduct(double, double) 536 */ 537 public static DD ofSquare(double x) { 538 return twoSquare(x); 539 } 540 541 /** 542 * Returns a {@code DD} whose value is {@code (x / y)}. 543 * If {@code y = 0} the result is undefined. 544 * 545 * <p>This method ignores special handling of non-normal numbers and intermediate 546 * overflow within the extended precision computation. 547 * This creates the following special cases: 548 * 549 * <ul> 550 * <li>If either {@code |x / y|} or {@code |y|} multiplied by {@code 1 + 2^27} 551 * is infinite (intermediate overflow) then the low part is NaN.</li> 552 * <li>If {@code x / y} is infinite then the low part is NaN.</li> 553 * <li>If {@code x} or {@code y} is infinite or NaN then the low part is NaN.</li> 554 * <li>If {@code x / y} is sub-normal or zero, excluding the previous cases, 555 * then the low part is +/-0.0.</li> 556 * </ul> 557 * 558 * <p>An invalid result can be identified using {@link #isFinite()}. 559 * 560 * <p>The result is the closest double-double representation to the quotient. 561 * 562 * @param x Dividend. 563 * @param y Divisor. 564 * @return the quotient {@code x / y}. 565 */ 566 public static DD fromQuotient(double x, double y) { 567 // Long division 568 // quotient q0 = x / y 569 final double q0 = x / y; 570 // remainder r = x - q0 * y 571 final double p0 = q0 * y; 572 final double p1 = twoProductLow(q0, y, p0); 573 final double r0 = x - p0; 574 final double r1 = twoDiffLow(x, p0, r0) - p1; 575 // correction term q1 = r0 / y 576 final double q1 = (r0 + r1) / y; 577 return new DD(q0, q1); 578 } 579 580 // Properties 581 582 /** 583 * Gets the first part {@code x} of the double-double number {@code (x, xx)}. 584 * In a normalized double-double number this part will have the greatest magnitude. 585 * 586 * <p>This is equivalent to returning the high-part {@code x}<sub>hi</sub> for the number 587 * {@code (x}<sub>hi</sub>{@code , x}<sub>lo</sub>{@code )}. 588 * 589 * @return the first part 590 */ 591 public double hi() { 592 return x; 593 } 594 595 /** 596 * Gets the second part {@code xx} of the double-double number {@code (x, xx)}. 597 * In a normalized double-double number this part will have the smallest magnitude. 598 * 599 * <p>This is equivalent to returning the low part {@code x}<sub>lo</sub> for the number 600 * {@code (x}<sub>hi</sub>{@code , x}<sub>lo</sub>{@code )}. 601 * 602 * @return the second part 603 */ 604 public double lo() { 605 return xx; 606 } 607 608 /** 609 * Returns {@code true} if the evaluated sum of the parts is finite. 610 * 611 * <p>This method is provided as a utility to check the result of a {@code DD} computation. 612 * Note that for performance the {@code DD} class does not follow IEEE754 arithmetic 613 * for infinite and NaN, and does not protect from overflow of intermediate values in 614 * multiply and divide operations. If this method returns {@code false} following 615 * {@code DD} arithmetic then the computation is not supported to extended precision. 616 * 617 * <p>Note: Any number that returns {@code true} may be converted to the exact 618 * {@link BigDecimal} value. 619 * 620 * @return {@code true} if this instance represents a finite {@code double} value. 621 * @see Double#isFinite(double) 622 * @see #bigDecimalValue() 623 */ 624 public boolean isFinite() { 625 return Double.isFinite(x + xx); 626 } 627 628 // Number conversions 629 630 /** 631 * Get the value as a {@code double}. This is the evaluated sum of the parts. 632 * 633 * <p>Note that even when the return value is finite, this conversion can lose 634 * information about the precision of the {@code DD} value. 635 * 636 * <p>Conversion of a finite {@code DD} can also be performed using the 637 * {@link #bigDecimalValue() BigDecimal} representation. 638 * 639 * @return the value converted to a {@code double} 640 * @see #bigDecimalValue() 641 */ 642 @Override 643 public double doubleValue() { 644 return x + xx; 645 } 646 647 /** 648 * Get the value as a {@code float}. This is the narrowing primitive conversion of the 649 * {@link #doubleValue()}. This conversion can lose range, resulting in a 650 * {@code float} zero from a nonzero {@code double} and a {@code float} infinity from 651 * a finite {@code double}. A {@code double} NaN is converted to a {@code float} NaN 652 * and a {@code double} infinity is converted to the same-signed {@code float} 653 * infinity. 654 * 655 * <p>Note that even when the return value is finite, this conversion can lose 656 * information about the precision of the {@code DD} value. 657 * 658 * <p>Conversion of a finite {@code DD} can also be performed using the 659 * {@link #bigDecimalValue() BigDecimal} representation. 660 * 661 * @return the value converted to a {@code float} 662 * @see #bigDecimalValue() 663 */ 664 @Override 665 public float floatValue() { 666 return (float) doubleValue(); 667 } 668 669 /** 670 * Get the value as an {@code int}. This conversion discards the fractional part of the 671 * number and effectively rounds the value to the closest whole number in the direction 672 * of zero. This is the equivalent of a cast of a floating-point number to an integer, for 673 * example {@code (int) -2.75 => -2}. 674 * 675 * <p>Note that this conversion can lose information about the precision of the 676 * {@code DD} value. 677 * 678 * <p>Special cases: 679 * <ul> 680 * <li>If the {@code DD} value is infinite the result is {@link Integer#MAX_VALUE}.</li> 681 * <li>If the {@code DD} value is -infinite the result is {@link Integer#MIN_VALUE}.</li> 682 * <li>If the {@code DD} value is NaN the result is 0.</li> 683 * </ul> 684 * 685 * <p>Conversion of a finite {@code DD} can also be performed using the 686 * {@link #bigDecimalValue() BigDecimal} representation. Note that {@link BigDecimal} 687 * conversion rounds to the {@link java.math.BigInteger BigInteger} whole number 688 * representation and returns the low-order 32-bits. Numbers too large for an {@code int} 689 * may change sign. This method ensures the sign is correct by directly rounding to 690 * an {@code int} and returning the respective upper or lower limit for numbers too 691 * large for an {@code int}. 692 * 693 * @return the value converted to an {@code int} 694 * @see #bigDecimalValue() 695 */ 696 @Override 697 public int intValue() { 698 // Clip the long value 699 return (int) Math.max(Integer.MIN_VALUE, Math.min(Integer.MAX_VALUE, longValue())); 700 } 701 702 /** 703 * Get the value as a {@code long}. This conversion discards the fractional part of the 704 * number and effectively rounds the value to the closest whole number in the direction 705 * of zero. This is the equivalent of a cast of a floating-point number to an integer, for 706 * example {@code (long) -2.75 => -2}. 707 * 708 * <p>Note that this conversion can lose information about the precision of the 709 * {@code DD} value. 710 * 711 * <p>Special cases: 712 * <ul> 713 * <li>If the {@code DD} value is infinite the result is {@link Long#MAX_VALUE}.</li> 714 * <li>If the {@code DD} value is -infinite the result is {@link Long#MIN_VALUE}.</li> 715 * <li>If the {@code DD} value is NaN the result is 0.</li> 716 * </ul> 717 * 718 * <p>Conversion of a finite {@code DD} can also be performed using the 719 * {@link #bigDecimalValue() BigDecimal} representation. Note that {@link BigDecimal} 720 * conversion rounds to the {@link java.math.BigInteger BigInteger} whole number 721 * representation and returns the low-order 64-bits. Numbers too large for a {@code long} 722 * may change sign. This method ensures the sign is correct by directly rounding to 723 * a {@code long} and returning the respective upper or lower limit for numbers too 724 * large for a {@code long}. 725 * 726 * @return the value converted to a {@code long} 727 * @see #bigDecimalValue() 728 */ 729 @Override 730 public long longValue() { 731 // Assume |hi| > |lo|, i.e. the low part is the round-off 732 final long a = (long) x; 733 // The cast will truncate the value to the range [Long.MIN_VALUE, Long.MAX_VALUE]. 734 // If the long converted back to a double is the same value then the high part 735 // was a representable integer and we must use the low part. 736 // Note: The floating-point comparison is intentional. 737 if (a == x) { 738 // Edge case: Any double value above 2^53 is even. To workaround representation 739 // of 2^63 as Long.MAX_VALUE (which is 2^63-1) we can split a into two parts. 740 final long a1; 741 final long a2; 742 if (Math.abs(x) > TWO_POW_53) { 743 a1 = (long) (x * 0.5); 744 a2 = a1; 745 } else { 746 a1 = a; 747 a2 = 0; 748 } 749 750 // To truncate the fractional part of the double-double towards zero we 751 // convert the low part to a whole number. This must be rounded towards zero 752 // with respect to the sign of the high part. 753 final long b = (long) (a < 0 ? Math.ceil(xx) : Math.floor(xx)); 754 755 final long sum = a1 + b + a2; 756 // Avoid overflow. If the sum has changed sign then an overflow occurred. 757 // This happens when high == 2^63 and the low part is additional magnitude. 758 // The xor operation creates a negative if the signs are different. 759 if ((sum ^ a) >= 0) { 760 return sum; 761 } 762 } 763 // Here the high part had a fractional part, was non-finite or was 2^63. 764 // Ignore the low part. 765 return a; 766 } 767 768 /** 769 * Get the value as a {@code BigDecimal}. This is the evaluated sum of the parts; 770 * the conversion is exact. 771 * 772 * <p>The conversion will raise a {@link NumberFormatException} if the number 773 * is non-finite. 774 * 775 * @return the double-double as a {@code BigDecimal}. 776 * @throws NumberFormatException if any part of the number is {@code infinite} or {@code NaN} 777 * @see BigDecimal 778 */ 779 public BigDecimal bigDecimalValue() { 780 return new BigDecimal(x).add(new BigDecimal(xx)); 781 } 782 783 // Static extended precision methods for computing the round-off component 784 // for double addition and multiplication 785 786 /** 787 * Compute the sum of two numbers {@code a} and {@code b} using 788 * Dekker's two-sum algorithm. The values are required to be ordered by magnitude: 789 * {@code |a| >= |b|}. 790 * 791 * <p>If {@code a} is zero and {@code b} is non-zero the returned value is {@code (b, 0)}. 792 * 793 * @param a First part of sum. 794 * @param b Second part of sum. 795 * @return the sum 796 * @see #fastTwoDiff(double, double) 797 * @see <a href="https://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps"> 798 * Shewchuk (1997) Theorum 6</a> 799 */ 800 static DD fastTwoSum(double a, double b) { 801 final double x = a + b; 802 return new DD(x, fastTwoSumLow(a, b, x)); 803 } 804 805 /** 806 * Compute the round-off of the sum of two numbers {@code a} and {@code b} using 807 * Dekker's two-sum algorithm. The values are required to be ordered by magnitude: 808 * {@code |a| >= |b|}. 809 * 810 * <p>If {@code a} is zero and {@code b} is non-zero the returned value is zero. 811 * 812 * @param a First part of sum. 813 * @param b Second part of sum. 814 * @param x Sum. 815 * @return the sum round-off 816 * @see #fastTwoSum(double, double) 817 */ 818 static double fastTwoSumLow(double a, double b, double x) { 819 // (x, xx) = a + b 820 // bVirtual = x - a 821 // xx = b - bVirtual 822 return b - (x - a); 823 } 824 825 /** 826 * Compute the difference of two numbers {@code a} and {@code b} using 827 * Dekker's two-sum algorithm. The values are required to be ordered by magnitude: 828 * {@code |a| >= |b|}. 829 * 830 * <p>Computes the same results as {@link #fastTwoSum(double, double) fastTwoSum(a, -b)}. 831 * 832 * @param a Minuend. 833 * @param b Subtrahend. 834 * @return the difference 835 * @see #fastTwoSum(double, double) 836 * @see <a href="https://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps"> 837 * Shewchuk (1997) Theorum 6</a> 838 */ 839 static DD fastTwoDiff(double a, double b) { 840 final double x = a - b; 841 return new DD(x, fastTwoDiffLow(a, b, x)); 842 } 843 844 /** 845 * Compute the round-off of the difference of two numbers {@code a} and {@code b} using 846 * Dekker's two-sum algorithm. The values are required to be ordered by magnitude: 847 * {@code |a| >= |b|}. 848 * 849 * @param a Minuend. 850 * @param b Subtrahend. 851 * @param x Difference. 852 * @return the difference round-off 853 * @see #fastTwoDiff(double, double) 854 */ 855 private static double fastTwoDiffLow(double a, double b, double x) { 856 // (x, xx) = a - b 857 // bVirtual = a - x 858 // xx = bVirtual - b 859 return (a - x) - b; 860 } 861 862 /** 863 * Compute the sum of two numbers {@code a} and {@code b} using 864 * Knuth's two-sum algorithm. The values are not required to be ordered by magnitude, 865 * i.e. the result is commutative {@code s = a + b == b + a}. 866 * 867 * @param a First part of sum. 868 * @param b Second part of sum. 869 * @return the sum 870 * @see #twoDiff(double, double) 871 * @see <a href="https://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps"> 872 * Shewchuk (1997) Theorum 7</a> 873 */ 874 static DD twoSum(double a, double b) { 875 final double x = a + b; 876 return new DD(x, twoSumLow(a, b, x)); 877 } 878 879 /** 880 * Compute the round-off of the sum of two numbers {@code a} and {@code b} using 881 * Knuth two-sum algorithm. The values are not required to be ordered by magnitude, 882 * i.e. the result is commutative {@code s = a + b == b + a}. 883 * 884 * @param a First part of sum. 885 * @param b Second part of sum. 886 * @param x Sum. 887 * @return the sum round-off 888 * @see #twoSum(double, double) 889 */ 890 static double twoSumLow(double a, double b, double x) { 891 // (x, xx) = a + b 892 // bVirtual = x - a 893 // aVirtual = x - bVirtual 894 // bRoundoff = b - bVirtual 895 // aRoundoff = a - aVirtual 896 // xx = aRoundoff + bRoundoff 897 final double bVirtual = x - a; 898 return (a - (x - bVirtual)) + (b - bVirtual); 899 } 900 901 /** 902 * Compute the difference of two numbers {@code a} and {@code b} using 903 * Knuth's two-sum algorithm. The values are not required to be ordered by magnitude. 904 * 905 * <p>Computes the same results as {@link #twoSum(double, double) twoSum(a, -b)}. 906 * 907 * @param a Minuend. 908 * @param b Subtrahend. 909 * @return the difference 910 * @see #twoSum(double, double) 911 */ 912 static DD twoDiff(double a, double b) { 913 final double x = a - b; 914 return new DD(x, twoDiffLow(a, b, x)); 915 } 916 917 /** 918 * Compute the round-off of the difference of two numbers {@code a} and {@code b} using 919 * Knuth two-sum algorithm. The values are not required to be ordered by magnitude, 920 * 921 * @param a Minuend. 922 * @param b Subtrahend. 923 * @param x Difference. 924 * @return the difference round-off 925 * @see #twoDiff(double, double) 926 */ 927 private static double twoDiffLow(double a, double b, double x) { 928 // (x, xx) = a - b 929 // bVirtual = a - x 930 // aVirtual = x + bVirtual 931 // bRoundoff = b - bVirtual 932 // aRoundoff = a - aVirtual 933 // xx = aRoundoff - bRoundoff 934 final double bVirtual = a - x; 935 return (a - (x + bVirtual)) - (b - bVirtual); 936 } 937 938 /** 939 * Compute the double-double number {@code (z,zz)} for the exact 940 * product of {@code x} and {@code y}. 941 * 942 * <p>The high part of the number is equal to the product {@code z = x * y}. 943 * The low part is set to the round-off of the {@code double} product. 944 * 945 * <p>This method ignores special handling of non-normal numbers and intermediate 946 * overflow within the extended precision computation. 947 * This creates the following special cases: 948 * 949 * <ul> 950 * <li>If {@code x * y} is sub-normal or zero then the low part is +/-0.0.</li> 951 * <li>If {@code x * y} is infinite then the low part is NaN.</li> 952 * <li>If {@code x} or {@code y} is infinite or NaN then the low part is NaN.</li> 953 * <li>If either {@code |x|} or {@code |y|} multiplied by {@code 1 + 2^27} 954 * is infinite (intermediate overflow) then the low part is NaN.</li> 955 * </ul> 956 * 957 * <p>Note: Ignoring special cases is a design choice for performance. The 958 * method is therefore not a drop-in replacement for 959 * {@code round_off = Math.fma(x, y, -x * y)}. 960 * 961 * @param x First factor. 962 * @param y Second factor. 963 * @return the product 964 */ 965 static DD twoProd(double x, double y) { 966 final double xy = x * y; 967 // No checks for non-normal xy, or overflow during the split of the arguments 968 return new DD(xy, twoProductLow(x, y, xy)); 969 } 970 971 /** 972 * Compute the low part of the double length number {@code (z,zz)} for the exact 973 * product of {@code x} and {@code y} using Dekker's mult12 algorithm. The standard 974 * precision product {@code x*y} must be provided. The numbers {@code x} and {@code y} 975 * are split into high and low parts using Dekker's algorithm. 976 * 977 * <p>Warning: This method does not perform scaling in Dekker's split and large 978 * finite numbers can create NaN results. 979 * 980 * @param x First factor. 981 * @param y Second factor. 982 * @param xy Product of the factors (x * y). 983 * @return the low part of the product double length number 984 * @see #highPart(double) 985 */ 986 static double twoProductLow(double x, double y, double xy) { 987 // Split the numbers using Dekker's algorithm without scaling 988 final double hx = highPart(x); 989 final double lx = x - hx; 990 final double hy = highPart(y); 991 final double ly = y - hy; 992 return twoProductLow(hx, lx, hy, ly, xy); 993 } 994 995 /** 996 * Compute the low part of the double length number {@code (z,zz)} for the exact 997 * product of {@code x} and {@code y} using Dekker's mult12 algorithm. The standard 998 * precision product {@code x*y}, and the high and low parts of the factors must be 999 * provided. 1000 * 1001 * @param hx High-part of first factor. 1002 * @param lx Low-part of first factor. 1003 * @param hy High-part of second factor. 1004 * @param ly Low-part of second factor. 1005 * @param xy Product of the factors (x * y). 1006 * @return the low part of the product double length number 1007 */ 1008 static double twoProductLow(double hx, double lx, double hy, double ly, double xy) { 1009 // Compute the multiply low part: 1010 // err1 = xy - hx * hy 1011 // err2 = err1 - lx * hy 1012 // err3 = err2 - hx * ly 1013 // low = lx * ly - err3 1014 return lx * ly - (((xy - hx * hy) - lx * hy) - hx * ly); 1015 } 1016 1017 /** 1018 * Compute the double-double number {@code (z,zz)} for the exact 1019 * square of {@code x}. 1020 * 1021 * <p>The high part of the number is equal to the square {@code z = x * x}. 1022 * The low part is set to the round-off of the {@code double} square. 1023 * 1024 * <p>This method is an optimisation of {@link #twoProd(double, double) twoProd(x, x)}. 1025 * See that method for details of special cases. 1026 * 1027 * @param x Factor. 1028 * @return the square 1029 * @see #twoProd(double, double) 1030 */ 1031 static DD twoSquare(double x) { 1032 final double xx = x * x; 1033 // No checks for non-normal xy, or overflow during the split of the arguments 1034 return new DD(xx, twoSquareLow(x, xx)); 1035 } 1036 1037 /** 1038 * Compute the low part of the double length number {@code (z,zz)} for the exact 1039 * square of {@code x} using Dekker's mult12 algorithm. The standard 1040 * precision square {@code x*x} must be provided. The number {@code x} 1041 * is split into high and low parts using Dekker's algorithm. 1042 * 1043 * <p>Warning: This method does not perform scaling in Dekker's split and large 1044 * finite numbers can create NaN results. 1045 * 1046 * @param x Factor. 1047 * @param x2 Square of the factor (x * x). 1048 * @return the low part of the square double length number 1049 * @see #highPart(double) 1050 * @see #twoProductLow(double, double, double) 1051 */ 1052 static double twoSquareLow(double x, double x2) { 1053 // See productLowUnscaled 1054 final double hx = highPart(x); 1055 final double lx = x - hx; 1056 return twoSquareLow(hx, lx, x2); 1057 } 1058 1059 /** 1060 * Compute the low part of the double length number {@code (z,zz)} for the exact 1061 * square of {@code x} using Dekker's mult12 algorithm. The standard 1062 * precision square {@code x*x}, and the high and low parts of the factors must be 1063 * provided. 1064 * 1065 * @param hx High-part of factor. 1066 * @param lx Low-part of factor. 1067 * @param x2 Square of the factor (x * x). 1068 * @return the low part of the square double length number 1069 */ 1070 static double twoSquareLow(double hx, double lx, double x2) { 1071 return lx * lx - ((x2 - hx * hx) - 2 * lx * hx); 1072 } 1073 1074 /** 1075 * Implement Dekker's method to split a value into two parts. Multiplying by (2^s + 1) creates 1076 * a big value from which to derive the two split parts. 1077 * <pre> 1078 * c = (2^s + 1) * a 1079 * a_big = c - a 1080 * a_hi = c - a_big 1081 * a_lo = a - a_hi 1082 * a = a_hi + a_lo 1083 * </pre> 1084 * 1085 * <p>The multiplicand allows a p-bit value to be split into 1086 * (p-s)-bit value {@code a_hi} and a non-overlapping (s-1)-bit value {@code a_lo}. 1087 * Combined they have (p-1) bits of significand but the sign bit of {@code a_lo} 1088 * contains a bit of information. The constant is chosen so that s is ceil(p/2) where 1089 * the precision p for a double is 53-bits (1-bit of the mantissa is assumed to be 1090 * 1 for a non sub-normal number) and s is 27. 1091 * 1092 * <p>This conversion does not use scaling and the result of overflow is NaN. Overflow 1093 * may occur when the exponent of the input value is above 996. 1094 * 1095 * <p>Splitting a NaN or infinite value will return NaN. 1096 * 1097 * @param value Value. 1098 * @return the high part of the value. 1099 * @see Math#getExponent(double) 1100 */ 1101 static double highPart(double value) { 1102 final double c = MULTIPLIER * value; 1103 return c - (c - value); 1104 } 1105 1106 // Public API operations 1107 1108 /** 1109 * Returns a {@code DD} whose value is the negation of both parts of double-double number. 1110 * 1111 * @return the negation 1112 */ 1113 @Override 1114 public DD negate() { 1115 return new DD(-x, -xx); 1116 } 1117 1118 /** 1119 * Returns a {@code DD} whose value is the absolute value of the number {@code (x, xx)} 1120 * This method assumes that the low part {@code xx} is the smaller magnitude. 1121 * 1122 * <p>Cases: 1123 * <ul> 1124 * <li>If the {@code x} value is negative the result is {@code (-x, -xx)}.</li> 1125 * <li>If the {@code x} value is +/- 0.0 the result is {@code (0.0, 0.0)}; this 1126 * will remove sign information from the round-off component assumed to be zero.</li> 1127 * <li>Otherwise the result is {@code this}.</li> 1128 * </ul> 1129 * 1130 * @return the absolute value 1131 * @see #negate() 1132 * @see #ZERO 1133 */ 1134 public DD abs() { 1135 // Assume |hi| > |lo|, i.e. the low part is the round-off 1136 if (x < 0) { 1137 return negate(); 1138 } 1139 // NaN, positive or zero 1140 // return a canonical absolute of zero 1141 return x == 0 ? ZERO : this; 1142 } 1143 1144 /** 1145 * Returns the largest (closest to positive infinity) {@code DD} value that is less 1146 * than or equal to {@code this} number {@code (x, xx)} and is equal to a mathematical integer. 1147 * 1148 * <p>This method may change the representation of zero and non-finite values; the 1149 * result is equivalent to {@code Math.floor(x)} and the {@code xx} part is ignored. 1150 * 1151 * <p>Cases: 1152 * <ul> 1153 * <li>If {@code x} is NaN, then the result is {@code (NaN, 0)}.</li> 1154 * <li>If {@code x} is infinite, then the result is {@code (x, 0)}.</li> 1155 * <li>If {@code x} is +/-0.0, then the result is {@code (x, 0)}.</li> 1156 * <li>If {@code x != Math.floor(x)}, then the result is {@code (Math.floor(x), 0)}.</li> 1157 * <li>Otherwise the result is the {@code DD} value equal to the sum 1158 * {@code Math.floor(x) + Math.floor(xx)}.</li> 1159 * </ul> 1160 * 1161 * <p>The result may generate a high part smaller (closer to negative infinity) than 1162 * {@code Math.floor(x)} if {@code x} is a representable integer and the {@code xx} value 1163 * is negative. 1164 * 1165 * @return the largest (closest to positive infinity) value that is less than or equal 1166 * to {@code this} and is equal to a mathematical integer 1167 * @see Math#floor(double) 1168 * @see #isFinite() 1169 */ 1170 public DD floor() { 1171 return floorOrCeil(x, xx, Math::floor); 1172 } 1173 1174 /** 1175 * Returns the smallest (closest to negative infinity) {@code DD} value that is greater 1176 * than or equal to {@code this} number {@code (x, xx)} and is equal to a mathematical integer. 1177 * 1178 * <p>This method may change the representation of zero and non-finite values; the 1179 * result is equivalent to {@code Math.ceil(x)} and the {@code xx} part is ignored. 1180 * 1181 * <p>Cases: 1182 * <ul> 1183 * <li>If {@code x} is NaN, then the result is {@code (NaN, 0)}.</li> 1184 * <li>If {@code x} is infinite, then the result is {@code (x, 0)}.</li> 1185 * <li>If {@code x} is +/-0.0, then the result is {@code (x, 0)}.</li> 1186 * <li>If {@code x != Math.ceil(x)}, then the result is {@code (Math.ceil(x), 0)}.</li> 1187 * <li>Otherwise the result is the {@code DD} value equal to the sum 1188 * {@code Math.ceil(x) + Math.ceil(xx)}.</li> 1189 * </ul> 1190 * 1191 * <p>The result may generate a high part larger (closer to positive infinity) than 1192 * {@code Math.ceil(x)} if {@code x} is a representable integer and the {@code xx} value 1193 * is positive. 1194 * 1195 * @return the smallest (closest to negative infinity) value that is greater than or equal 1196 * to {@code this} and is equal to a mathematical integer 1197 * @see Math#ceil(double) 1198 * @see #isFinite() 1199 */ 1200 public DD ceil() { 1201 return floorOrCeil(x, xx, Math::ceil); 1202 } 1203 1204 /** 1205 * Implementation of the floor and ceiling functions. 1206 * 1207 * <p>Cases: 1208 * <ul> 1209 * <li>If {@code x} is non-finite or zero, then the result is {@code (x, 0)}.</li> 1210 * <li>If {@code x} is rounded by the operator to a new value {@code y}, then the 1211 * result is {@code (y, 0)}.</li> 1212 * <li>Otherwise the result is the {@code DD} value equal to the sum {@code op(x) + op(xx)}.</li> 1213 * </ul> 1214 * 1215 * @param x High part of x. 1216 * @param xx Low part of x. 1217 * @param op Floor or ceiling operator. 1218 * @return the result 1219 */ 1220 private static DD floorOrCeil(double x, double xx, DoubleUnaryOperator op) { 1221 // Assume |hi| > |lo|, i.e. the low part is the round-off 1222 final double y = op.applyAsDouble(x); 1223 // Note: The floating-point comparison is intentional 1224 if (y == x) { 1225 // Handle non-finite and zero by ignoring the low part 1226 if (isNotNormal(y)) { 1227 return new DD(y, 0); 1228 } 1229 // High part is an integer, use the low part. 1230 // Any rounding must propagate to the high part. 1231 // Note: add 0.0 to convert -0.0 to 0.0. This is required to ensure 1232 // the round-off component of the fastTwoSum result is always 0.0 1233 // when yy == 0. This only applies in the ceiling operator when 1234 // xx is in (-1, 0] and will be converted to -0.0. 1235 final double yy = op.applyAsDouble(xx) + 0; 1236 return fastTwoSum(y, yy); 1237 } 1238 // NaN or already rounded. 1239 // xx has no effect on the rounding. 1240 return new DD(y, 0); 1241 } 1242 1243 /** 1244 * Returns a {@code DD} whose value is {@code (this + y)}. 1245 * 1246 * <p>This computes the same result as 1247 * {@link #add(DD) add(DD.of(y))}. 1248 * 1249 * <p>The computed result is within 2 eps of the exact result where eps is 2<sup>-106</sup>. 1250 * 1251 * @param y Value to be added to this number. 1252 * @return {@code this + y}. 1253 * @see #add(DD) 1254 */ 1255 public DD add(double y) { 1256 // (s0, s1) = x + y 1257 final double s0 = x + y; 1258 final double s1 = twoSumLow(x, y, s0); 1259 // Note: if x + y cancel to a non-zero result then s0 is >= 1 ulp of x. 1260 // This is larger than xx so fast-two-sum can be used. 1261 return fastTwoSum(s0, s1 + xx); 1262 } 1263 1264 /** 1265 * Returns a {@code DD} whose value is {@code (this + y)}. 1266 * 1267 * <p>The computed result is within 4 eps of the exact result where eps is 2<sup>-106</sup>. 1268 * 1269 * @param y Value to be added to this number. 1270 * @return {@code this + y}. 1271 */ 1272 @Override 1273 public DD add(DD y) { 1274 return add(x, xx, y.x, y.xx); 1275 } 1276 1277 /** 1278 * Compute the sum of {@code (x, xx)} and {@code (y, yy)}. 1279 * 1280 * <p>The computed result is within 4 eps of the exact result where eps is 2<sup>-106</sup>. 1281 * 1282 * @param x High part of x. 1283 * @param xx Low part of x. 1284 * @param y High part of y. 1285 * @param yy Low part of y. 1286 * @return the sum 1287 * @see #accurateAdd(double, double, double, double) 1288 */ 1289 static DD add(double x, double xx, double y, double yy) { 1290 // Sum parts and save 1291 // (s0, s1) = x + y 1292 final double s0 = x + y; 1293 final double s1 = twoSumLow(x, y, s0); 1294 // (t0, t1) = xx + yy 1295 final double t0 = xx + yy; 1296 final double t1 = twoSumLow(xx, yy, t0); 1297 // result = s + t 1298 // |s1| is >= 1 ulp of max(|x|, |y|) 1299 // |t0| is >= 1 ulp of max(|xx|, |yy|) 1300 final DD zz = fastTwoSum(s0, s1 + t0); 1301 return fastTwoSum(zz.x, zz.xx + t1); 1302 } 1303 1304 /** 1305 * Compute the sum of {@code (x, xx)} and {@code y}. 1306 * 1307 * <p>This computes the same result as 1308 * {@link #accurateAdd(double, double, double, double) accurateAdd(x, xx, y, 0)}. 1309 * 1310 * <p>Note: This is an internal helper method used when accuracy is required. 1311 * The computed result is within 1 eps of the exact result where eps is 2<sup>-106</sup>. 1312 * The performance is approximately 1.5-fold slower than {@link #add(double)}. 1313 * 1314 * @param x High part of x. 1315 * @param xx Low part of x. 1316 * @param y y. 1317 * @return the sum 1318 */ 1319 static DD accurateAdd(double x, double xx, double y) { 1320 // Grow expansion (Schewchuk): (x, xx) + y -> (s0, s1, s2) 1321 DD s = twoSum(xx, y); 1322 double s2 = s.xx; 1323 s = twoSum(x, s.x); 1324 final double s0 = s.x; 1325 final double s1 = s.xx; 1326 // Compress (Schewchuk Fig. 15): (s0, s1, s2) -> (s0, s1) 1327 s = fastTwoSum(s1, s2); 1328 s2 = s.xx; 1329 s = fastTwoSum(s0, s.x); 1330 // Here (s0, s1) = s 1331 // e = exact 159-bit result 1332 // |e - s0| <= ulp(s0) 1333 // |s1 + s2| <= ulp(e - s0) 1334 return fastTwoSum(s.x, s2 + s.xx); 1335 } 1336 1337 /** 1338 * Compute the sum of {@code (x, xx)} and {@code (y, yy)}. 1339 * 1340 * <p>The high-part of the result is within 1 ulp of the true sum {@code e}. 1341 * The low-part of the result is within 1 ulp of the result of the high-part 1342 * subtracted from the true sum {@code e - hi}. 1343 * 1344 * <p>Note: This is an internal helper method used when accuracy is required. 1345 * The computed result is within 1 eps of the exact result where eps is 2<sup>-106</sup>. 1346 * The performance is approximately 2-fold slower than {@link #add(DD)}. 1347 * 1348 * @param x High part of x. 1349 * @param xx Low part of x. 1350 * @param y High part of y. 1351 * @param yy Low part of y. 1352 * @return the sum 1353 */ 1354 static DD accurateAdd(double x, double xx, double y, double yy) { 1355 // Expansion sum (Schewchuk Fig 7): (x, xx) + (x, yy) -> (s0, s1, s2, s3) 1356 DD s = twoSum(xx, yy); 1357 double s3 = s.xx; 1358 s = twoSum(x, s.x); 1359 // (s0, s1, s2) == (s.x, s.xx, s3) 1360 double s0 = s.x; 1361 s = twoSum(s.xx, y); 1362 double s2 = s.xx; 1363 s = twoSum(s0, s.x); 1364 // s1 = s.xx 1365 s0 = s.x; 1366 // Compress (Schewchuk Fig. 15) (s0, s1, s2, s3) -> (s0, s1) 1367 s = fastTwoSum(s.xx, s2); 1368 final double s1 = s.x; 1369 s = fastTwoSum(s.xx, s3); 1370 // s2 = s.x 1371 s3 = s.xx; 1372 s = fastTwoSum(s1, s.x); 1373 s2 = s.xx; 1374 s = fastTwoSum(s0, s.x); 1375 // Here (s0, s1) = s 1376 // e = exact 212-bit result 1377 // |e - s0| <= ulp(s0) 1378 // |s1 + s2 + s3| <= ulp(e - s0) (Sum magnitudes small to high) 1379 return fastTwoSum(s.x, s3 + s2 + s.xx); 1380 } 1381 1382 /** 1383 * Returns a {@code DD} whose value is {@code (this - y)}. 1384 * 1385 * <p>This computes the same result as {@link #add(DD) add(-y)}. 1386 * 1387 * <p>The computed result is within 2 eps of the exact result where eps is 2<sup>-106</sup>. 1388 * 1389 * @param y Value to be subtracted from this number. 1390 * @return {@code this - y}. 1391 * @see #subtract(DD) 1392 */ 1393 public DD subtract(double y) { 1394 return add(-y); 1395 } 1396 1397 /** 1398 * Returns a {@code DD} whose value is {@code (this - y)}. 1399 * 1400 * <p>This computes the same result as {@link #add(DD) add(y.negate())}. 1401 * 1402 * <p>The computed result is within 4 eps of the exact result where eps is 2<sup>-106</sup>. 1403 * 1404 * @param y Value to be subtracted from this number. 1405 * @return {@code this - y}. 1406 */ 1407 @Override 1408 public DD subtract(DD y) { 1409 return add(x, xx, -y.x, -y.xx); 1410 } 1411 1412 /** 1413 * Returns a {@code DD} whose value is {@code this * y}. 1414 * 1415 * <p>This computes the same result as 1416 * {@link #multiply(DD) multiply(DD.of(y))}. 1417 * 1418 * <p>The computed result is within 4 eps of the exact result where eps is 2<sup>-106</sup>. 1419 * 1420 * @param y Factor. 1421 * @return {@code this * y}. 1422 * @see #multiply(DD) 1423 */ 1424 public DD multiply(double y) { 1425 return multiply(x, xx, y); 1426 } 1427 1428 /** 1429 * Compute the multiplication product of {@code (x, xx)} and {@code y}. 1430 * 1431 * <p>This computes the same result as 1432 * {@link #multiply(double, double, double, double) multiply(x, xx, y, 0)}. 1433 * 1434 * <p>The computed result is within 4 eps of the exact result where eps is 2<sup>-106</sup>. 1435 * 1436 * @param x High part of x. 1437 * @param xx Low part of x. 1438 * @param y High part of y. 1439 * @return the product 1440 * @see #multiply(double, double, double, double) 1441 */ 1442 private static DD multiply(double x, double xx, double y) { 1443 // Dekker mul2 with yy=0 1444 // (Alternative: Scale expansion (Schewchuk Fig 13)) 1445 final double hi = x * y; 1446 final double lo = twoProductLow(x, y, hi); 1447 // Save 2 FLOPS compared to multiply(x, xx, y, 0). 1448 // This is reused in divide to save more FLOPS so worth the optimisation. 1449 return fastTwoSum(hi, lo + xx * y); 1450 } 1451 1452 /** 1453 * Returns a {@code DD} whose value is {@code this * y}. 1454 * 1455 * <p>The computed result is within 4 eps of the exact result where eps is 2<sup>-106</sup>. 1456 * 1457 * @param y Factor. 1458 * @return {@code this * y}. 1459 */ 1460 @Override 1461 public DD multiply(DD y) { 1462 return multiply(x, xx, y.x, y.xx); 1463 } 1464 1465 /** 1466 * Compute the multiplication product of {@code (x, xx)} and {@code (y, yy)}. 1467 * 1468 * <p>The computed result is within 4 eps of the exact result where eps is 2<sup>-106</sup>. 1469 * 1470 * @param x High part of x. 1471 * @param xx Low part of x. 1472 * @param y High part of y. 1473 * @param yy Low part of y. 1474 * @return the product 1475 */ 1476 private static DD multiply(double x, double xx, double y, double yy) { 1477 // Dekker mul2 1478 // (Alternative: Scale expansion (Schewchuk Fig 13)) 1479 final double hi = x * y; 1480 final double lo = twoProductLow(x, y, hi); 1481 return fastTwoSum(hi, lo + (x * yy + xx * y)); 1482 } 1483 1484 /** 1485 * Returns a {@code DD} whose value is {@code this * this}. 1486 * 1487 * <p>This method is an optimisation of {@link #multiply(DD) multiply(this)}. 1488 * 1489 * <p>The computed result is within 4 eps of the exact result where eps is 2<sup>-106</sup>. 1490 * 1491 * @return {@code this}<sup>2</sup> 1492 * @see #multiply(DD) 1493 */ 1494 public DD square() { 1495 return square(x, xx); 1496 } 1497 1498 /** 1499 * Compute the square of {@code (x, xx)}. 1500 * 1501 * @param x High part of x. 1502 * @param xx Low part of x. 1503 * @return the square 1504 */ 1505 private static DD square(double x, double xx) { 1506 // Dekker mul2 1507 final double hi = x * x; 1508 final double lo = twoSquareLow(x, hi); 1509 return fastTwoSum(hi, lo + (2 * x * xx)); 1510 } 1511 1512 /** 1513 * Returns a {@code DD} whose value is {@code (this / y)}. 1514 * If {@code y = 0} the result is undefined. 1515 * 1516 * <p>The computed result is within 1 eps of the exact result where eps is 2<sup>-106</sup>. 1517 * 1518 * @param y Divisor. 1519 * @return {@code this / y}. 1520 */ 1521 public DD divide(double y) { 1522 return divide(x, xx, y); 1523 } 1524 1525 /** 1526 * Compute the division of {@code (x, xx)} by {@code y}. 1527 * If {@code y = 0} the result is undefined. 1528 * 1529 * <p>The computed result is within 1 eps of the exact result where eps is 2<sup>-106</sup>. 1530 * 1531 * @param x High part of x. 1532 * @param xx Low part of x. 1533 * @param y High part of y. 1534 * @return the quotient 1535 */ 1536 private static DD divide(double x, double xx, double y) { 1537 // Long division 1538 // quotient q0 = x / y 1539 final double q0 = x / y; 1540 // remainder r0 = x - q0 * y 1541 DD p = twoProd(y, q0); 1542 // High accuracy add required 1543 DD r = accurateAdd(x, xx, -p.x, -p.xx); 1544 // next quotient q1 = r0 / y 1545 final double q1 = r.x / y; 1546 // remainder r1 = r0 - q1 * y 1547 p = twoProd(y, q1); 1548 // accurateAdd not used as we do not need r1.xx 1549 r = add(r.x, r.xx, -p.x, -p.xx); 1550 // next quotient q2 = r1 / y 1551 final double q2 = r.x / y; 1552 // Collect (q0, q1, q2) 1553 final DD q = fastTwoSum(q0, q1); 1554 return twoSum(q.x, q.xx + q2); 1555 } 1556 1557 /** 1558 * Returns a {@code DD} whose value is {@code (this / y)}. 1559 * If {@code y = 0} the result is undefined. 1560 * 1561 * <p>The computed result is within 4 eps of the exact result where eps is 2<sup>-106</sup>. 1562 * 1563 * @param y Divisor. 1564 * @return {@code this / y}. 1565 */ 1566 @Override 1567 public DD divide(DD y) { 1568 return divide(x, xx, y.x, y.xx); 1569 } 1570 1571 /** 1572 * Compute the division of {@code (x, xx)} by {@code (y, yy)}. 1573 * If {@code y = 0} the result is undefined. 1574 * 1575 * <p>The computed result is within 4 eps of the exact result where eps is 2<sup>-106</sup>. 1576 * 1577 * @param x High part of x. 1578 * @param xx Low part of x. 1579 * @param y High part of y. 1580 * @param yy Low part of y. 1581 * @return the quotient 1582 */ 1583 private static DD divide(double x, double xx, double y, double yy) { 1584 // Long division 1585 // quotient q0 = x / y 1586 final double q0 = x / y; 1587 // remainder r0 = x - q0 * y 1588 DD p = multiply(y, yy, q0); 1589 // High accuracy add required 1590 DD r = accurateAdd(x, xx, -p.x, -p.xx); 1591 // next quotient q1 = r0 / y 1592 final double q1 = r.x / y; 1593 // remainder r1 = r0 - q1 * y 1594 p = multiply(y, yy, q1); 1595 // accurateAdd not used as we do not need r1.xx 1596 r = add(r.x, r.xx, -p.x, -p.xx); 1597 // next quotient q2 = r1 / y 1598 final double q2 = r.x / y; 1599 // Collect (q0, q1, q2) 1600 final DD q = fastTwoSum(q0, q1); 1601 return twoSum(q.x, q.xx + q2); 1602 } 1603 1604 /** 1605 * Compute the reciprocal of {@code this}. 1606 * If {@code this} value is zero the result is undefined. 1607 * 1608 * <p>The computed result is within 4 eps of the exact result where eps is 2<sup>-106</sup>. 1609 * 1610 * @return {@code this}<sup>-1</sup> 1611 */ 1612 @Override 1613 public DD reciprocal() { 1614 return reciprocal(x, xx); 1615 } 1616 1617 /** 1618 * Compute the inverse of {@code (y, yy)}. 1619 * If {@code y = 0} the result is undefined. 1620 * 1621 * <p>The computed result is within 4 eps of the exact result where eps is 2<sup>-106</sup>. 1622 * 1623 * @param y High part of y. 1624 * @param yy Low part of y. 1625 * @return the inverse 1626 */ 1627 private static DD reciprocal(double y, double yy) { 1628 // As per divide using (x, xx) = (1, 0) 1629 // quotient q0 = x / y 1630 final double q0 = 1 / y; 1631 // remainder r0 = x - q0 * y 1632 DD p = multiply(y, yy, q0); 1633 // High accuracy add required 1634 // This add saves 2 twoSum and 3 fastTwoSum (24 FLOPS) by ignoring the zero low part 1635 DD r = accurateAdd(-p.x, -p.xx, 1); 1636 // next quotient q1 = r0 / y 1637 final double q1 = r.x / y; 1638 // remainder r1 = r0 - q1 * y 1639 p = multiply(y, yy, q1); 1640 // accurateAdd not used as we do not need r1.xx 1641 r = add(r.x, r.xx, -p.x, -p.xx); 1642 // next quotient q2 = r1 / y 1643 final double q2 = r.x / y; 1644 // Collect (q0, q1, q2) 1645 final DD q = fastTwoSum(q0, q1); 1646 return twoSum(q.x, q.xx + q2); 1647 } 1648 1649 /** 1650 * Compute the square root of {@code this} number {@code (x, xx)}. 1651 * 1652 * <p>Uses the result {@code Math.sqrt(x)} 1653 * if that result is not a finite normalized {@code double}. 1654 * 1655 * <p>Special cases: 1656 * <ul> 1657 * <li>If {@code x} is NaN or less than zero, then the result is {@code (NaN, 0)}.</li> 1658 * <li>If {@code x} is positive infinity, then the result is {@code (+infinity, 0)}.</li> 1659 * <li>If {@code x} is positive zero or negative zero, then the result is {@code (x, 0)}.</li> 1660 * </ul> 1661 * 1662 * <p>The computed result is within 4 eps of the exact result where eps is 2<sup>-106</sup>. 1663 * 1664 * @return {@code sqrt(this)} 1665 * @see Math#sqrt(double) 1666 * @see Double#MIN_NORMAL 1667 */ 1668 public DD sqrt() { 1669 // Standard sqrt 1670 final double c = Math.sqrt(x); 1671 1672 // Here we support {negative, +infinity, nan and zero} edge cases. 1673 // This is required to avoid a divide by zero in the following 1674 // computation, otherwise (0, 0).sqrt() = (NaN, NaN). 1675 if (isNotNormal(c)) { 1676 return new DD(c, 0); 1677 } 1678 1679 // Here hi is positive, non-zero and finite; assume lo is also finite 1680 1681 // Dekker's double precision sqrt2 algorithm. 1682 // See Dekker, 1971, pp 242. 1683 final double hc = highPart(c); 1684 final double lc = c - hc; 1685 final double u = c * c; 1686 final double uu = twoSquareLow(hc, lc, u); 1687 final double cc = (x - u - uu + xx) * 0.5 / c; 1688 1689 // Extended precision result: 1690 // y = c + cc 1691 // yy = c - y + cc 1692 return fastTwoSum(c, cc); 1693 } 1694 1695 /** 1696 * Checks if the number is not normal. This is functionally equivalent to: 1697 * <pre>{@code 1698 * final double abs = Math.abs(a); 1699 * return (abs <= Double.MIN_NORMAL || !(abs <= Double.MAX_VALUE)); 1700 * }</pre> 1701 * 1702 * @param a The value. 1703 * @return true if the value is not normal 1704 */ 1705 static boolean isNotNormal(double a) { 1706 // Sub-normal numbers have a biased exponent of 0. 1707 // Inf/NaN numbers have a biased exponent of 2047. 1708 // Catch both cases by extracting the raw exponent, subtracting 1 1709 // and compare unsigned (so 0 underflows to a unsigned large value). 1710 final int baisedExponent = ((int) (Double.doubleToRawLongBits(a) >>> 52)) & EXP_MASK; 1711 // Pre-compute the additions used by Integer.compareUnsigned 1712 return baisedExponent + CMP_UNSIGNED_MINUS_1 >= CMP_UNSIGNED_2046; 1713 } 1714 1715 /** 1716 * Multiply {@code this} number {@code (x, xx)} by an integral power of two. 1717 * <pre> 1718 * (y, yy) = (x, xx) * 2^exp 1719 * </pre> 1720 * 1721 * <p>The result is rounded as if performed by a single correctly rounded floating-point 1722 * multiply. This performs the same result as: 1723 * <pre> 1724 * y = Math.scalb(x, exp); 1725 * yy = Math.scalb(xx, exp); 1726 * </pre> 1727 * 1728 * <p>The implementation computes using a single multiplication if {@code exp} 1729 * is in {@code [-1022, 1023]}. Otherwise the parts {@code (x, xx)} are scaled by 1730 * repeated multiplication by power-of-two factors. The result is exact unless the scaling 1731 * generates sub-normal parts; in this case precision may be lost by a single rounding. 1732 * 1733 * @param exp Power of two scale factor. 1734 * @return the result 1735 * @see Math#scalb(double, int) 1736 * @see #frexp(int[]) 1737 */ 1738 public DD scalb(int exp) { 1739 // Handle scaling when 2^n can be represented with a single normal number 1740 // n >= -1022 && n <= 1023 1741 // Using unsigned compare => n + 1022 <= 1023 + 1022 1742 if (exp + CMP_UNSIGNED_1022 < CMP_UNSIGNED_2046) { 1743 final double s = twoPow(exp); 1744 return new DD(x * s, xx * s); 1745 } 1746 1747 // Scale by multiples of 2^512 (largest representable power of 2). 1748 // Scaling requires max 5 multiplications to under/overflow any normal value. 1749 // Break this down into e.g.: 2^512^(exp / 512) * 2^(exp % 512) 1750 // Number of multiples n = exp / 512 : exp >>> 9 1751 // Remainder m = exp % 512 : exp & 511 (exp must be positive) 1752 final int n; 1753 final int m; 1754 double p; 1755 if (exp < 0) { 1756 // Downscaling 1757 // (Note: Using an unsigned shift handles negation of min value: -2^31) 1758 n = -exp >>> 9; 1759 // m = exp % 512 1760 m = -(-exp & 511); 1761 p = TWO_POW_M512; 1762 } else { 1763 // Upscaling 1764 n = exp >>> 9; 1765 m = exp & 511; 1766 p = TWO_POW_512; 1767 } 1768 1769 // Multiply by the remainder scaling factor first. The remaining multiplications 1770 // are either 2^512 or 2^-512. 1771 // Down-scaling to sub-normal will use the final multiplication into a sub-normal result. 1772 // Note here that n >= 1 as the n in [-1022, 1023] case has been handled. 1773 1774 final double z0; 1775 final double z1; 1776 1777 // Handle n : 1, 2, 3, 4, 5 1778 if (n >= 5) { 1779 // n >= 5 will be over/underflow. Use an extreme scale factor. 1780 // Do not use +/- infinity as this creates NaN if x = 0. 1781 // p -> 2^1023 or 2^-1025 1782 p *= p * 0.5; 1783 z0 = x * p * p * p; 1784 z1 = xx * p * p * p; 1785 return new DD(z0, z1); 1786 } 1787 1788 final double s = twoPow(m); 1789 if (n == 4) { 1790 z0 = x * s * p * p * p * p; 1791 z1 = xx * s * p * p * p * p; 1792 } else if (n == 3) { 1793 z0 = x * s * p * p * p; 1794 z1 = xx * s * p * p * p; 1795 } else if (n == 2) { 1796 z0 = x * s * p * p; 1797 z1 = xx * s * p * p; 1798 } else { 1799 // n = 1. Occurs only if exp = -1023. 1800 z0 = x * s * p; 1801 z1 = xx * s * p; 1802 } 1803 return new DD(z0, z1); 1804 } 1805 1806 /** 1807 * Create a normalized double with the value {@code 2^n}. 1808 * 1809 * <p>Warning: Do not call with {@code n = -1023}. This will create zero. 1810 * 1811 * @param n Exponent (in the range [-1022, 1023]). 1812 * @return the double 1813 */ 1814 static double twoPow(int n) { 1815 return Double.longBitsToDouble(((long) (n + 1023)) << 52); 1816 } 1817 1818 /** 1819 * Convert {@code this} number {@code x} to fractional {@code f} and integral 1820 * {@code 2^exp} components. 1821 * <pre> 1822 * x = f * 2^exp 1823 * </pre> 1824 * 1825 * <p>The combined fractional part (f, ff) is in the range {@code [0.5, 1)}. 1826 * 1827 * <p>Special cases: 1828 * <ul> 1829 * <li>If {@code x} is zero, then the normalized fraction is zero and the exponent is zero.</li> 1830 * <li>If {@code x} is NaN, then the normalized fraction is NaN and the exponent is unspecified.</li> 1831 * <li>If {@code x} is infinite, then the normalized fraction is infinite and the exponent is unspecified.</li> 1832 * <li>If high-part {@code x} is an exact power of 2 and the low-part {@code xx} has an opposite 1833 * signed non-zero magnitude then fraction high-part {@code f} will be {@code +/-1} such that 1834 * the double-double number is in the range {@code [0.5, 1)}.</li> 1835 * </ul> 1836 * 1837 * <p>This is named using the equivalent function in the standard C math.h library. 1838 * 1839 * @param exp Power of two scale factor (integral exponent). 1840 * @return Fraction part. 1841 * @see Math#getExponent(double) 1842 * @see #scalb(int) 1843 * @see <a href="https://www.cplusplus.com/reference/cmath/frexp/">C math.h frexp</a> 1844 */ 1845 public DD frexp(int[] exp) { 1846 exp[0] = getScale(x); 1847 // Handle non-scalable numbers 1848 if (exp[0] == Double.MAX_EXPONENT + 1) { 1849 // Returns +/-0.0, inf or nan 1850 // Maintain the fractional part unchanged. 1851 // Do not change the fractional part of inf/nan, and assume 1852 // |xx| < |x| thus if x == 0 then xx == 0 (otherwise the double-double is invalid) 1853 // Unspecified for NaN/inf so just return zero exponent. 1854 exp[0] = 0; 1855 return this; 1856 } 1857 // The scale will create the fraction in [1, 2) so increase by 1 for [0.5, 1) 1858 exp[0] += 1; 1859 DD f = scalb(-exp[0]); 1860 // Return |(hi, lo)| = (1, -eps) if required. 1861 // f.x * f.xx < 0 detects sign change unless the product underflows. 1862 // Handle extreme case of |f.xx| being min value by doubling f.x to 1. 1863 if (Math.abs(f.x) == HALF && 2 * f.x * f.xx < 0) { 1864 f = new DD(f.x * 2, f.xx * 2); 1865 exp[0] -= 1; 1866 } 1867 return f; 1868 } 1869 1870 /** 1871 * Returns a scale suitable for use with {@link Math#scalb(double, int)} to normalise 1872 * the number to the interval {@code [1, 2)}. 1873 * 1874 * <p>In contrast to {@link Math#getExponent(double)} this handles 1875 * sub-normal numbers by computing the number of leading zeros in the mantissa 1876 * and shifting the unbiased exponent. The result is that for all finite, non-zero, 1877 * numbers, the magnitude of {@code scalb(x, -getScale(x))} is 1878 * always in the range {@code [1, 2)}. 1879 * 1880 * <p>This method is a functional equivalent of the c function ilogb(double). 1881 * 1882 * <p>The result is to be used to scale a number using {@link Math#scalb(double, int)}. 1883 * Hence the special case of a zero argument is handled using the return value for NaN 1884 * as zero cannot be scaled. This is different from {@link Math#getExponent(double)}. 1885 * 1886 * <p>Special cases: 1887 * <ul> 1888 * <li>If the argument is NaN or infinite, then the result is {@link Double#MAX_EXPONENT} + 1.</li> 1889 * <li>If the argument is zero, then the result is {@link Double#MAX_EXPONENT} + 1.</li> 1890 * </ul> 1891 * 1892 * @param a Value. 1893 * @return The unbiased exponent of the value to be used for scaling, or 1024 for 0, NaN or Inf 1894 * @see Math#getExponent(double) 1895 * @see Math#scalb(double, int) 1896 * @see <a href="https://www.cplusplus.com/reference/cmath/ilogb/">ilogb</a> 1897 */ 1898 private static int getScale(double a) { 1899 // Only interested in the exponent and mantissa so remove the sign bit 1900 final long bits = Double.doubleToRawLongBits(a) & UNSIGN_MASK; 1901 // Get the unbiased exponent 1902 int exp = ((int) (bits >>> 52)) - EXPONENT_OFFSET; 1903 1904 // No case to distinguish nan/inf (exp == 1024). 1905 // Handle sub-normal numbers 1906 if (exp == Double.MIN_EXPONENT - 1) { 1907 // Special case for zero, return as nan/inf to indicate scaling is not possible 1908 if (bits == 0) { 1909 return Double.MAX_EXPONENT + 1; 1910 } 1911 // A sub-normal number has an exponent below -1022. The amount below 1912 // is defined by the number of shifts of the most significant bit in 1913 // the mantissa that is required to get a 1 at position 53 (i.e. as 1914 // if it were a normal number with assumed leading bit) 1915 final long mantissa = bits & MANTISSA_MASK; 1916 exp -= Long.numberOfLeadingZeros(mantissa << 12); 1917 } 1918 return exp; 1919 } 1920 1921 /** 1922 * Compute {@code this} number {@code (x, xx)} raised to the power {@code n}. 1923 * 1924 * <p>Special cases: 1925 * <ul> 1926 * <li>If {@code x} is not a finite normalized {@code double}, the low part {@code xx} 1927 * is ignored and the result is {@link Math#pow(double, double) Math.pow(x, n)}.</li> 1928 * <li>If {@code n = 0} the result is {@code (1, 0)}.</li> 1929 * <li>If {@code n = 1} the result is {@code (x, xx)}.</li> 1930 * <li>If {@code n = -1} the result is the {@link #reciprocal() reciprocal}.</li> 1931 * <li>If the computation overflows the result is undefined.</li> 1932 * </ul> 1933 * 1934 * <p>Computation uses multiplication by factors generated by repeat squaring of the value. 1935 * These multiplications have no special case handling for overflow; in the event of overflow 1936 * the result is undefined. The {@link #pow(int, long[])} method can be used to 1937 * generate a scaled fraction result for any finite {@code DD} number and exponent. 1938 * 1939 * <p>The computed result is approximately {@code 16 * (n - 1) * eps} of the exact result 1940 * where eps is 2<sup>-106</sup>. 1941 * 1942 * @param n Exponent. 1943 * @return {@code this}<sup>n</sup> 1944 * @see Math#pow(double, double) 1945 * @see #pow(int, long[]) 1946 * @see #isFinite() 1947 */ 1948 @Override 1949 public DD pow(int n) { 1950 // Edge cases. 1951 if (n == 1) { 1952 return this; 1953 } 1954 if (n == 0) { 1955 return ONE; 1956 } 1957 1958 // Handles {infinity, nan and zero} cases 1959 if (isNotNormal(x)) { 1960 // Assume the high part has the greatest magnitude 1961 // so here the low part is irrelevant 1962 return new DD(Math.pow(x, n), 0); 1963 } 1964 1965 // Here hi is finite; assume lo is also finite 1966 if (n == -1) { 1967 return reciprocal(); 1968 } 1969 1970 // Extended precision computation is required. 1971 // No checks for overflow. 1972 if (n < 0) { 1973 // Note: Correctly handles negating -2^31 1974 return computePow(x, xx, -n).reciprocal(); 1975 } 1976 return computePow(x, xx, n); 1977 } 1978 1979 /** 1980 * Compute the number {@code x} (non-zero finite) raised to the power {@code n}. 1981 * 1982 * <p>The input power is treated as an unsigned integer. Thus the negative value 1983 * {@link Integer#MIN_VALUE} is 2^31. 1984 * 1985 * @param x Fractional high part of x. 1986 * @param xx Fractional low part of x. 1987 * @param n Power (in [2, 2^31]). 1988 * @return x^n. 1989 */ 1990 private static DD computePow(double x, double xx, int n) { 1991 // Compute the power by multiplication (keeping track of the scale): 1992 // 13 = 1101 1993 // x^13 = x^8 * x^4 * x^1 1994 // = ((x^2 * x)^2)^2 * x 1995 // 21 = 10101 1996 // x^21 = x^16 * x^4 * x^1 1997 // = (((x^2)^2 * x)^2)^2 * x 1998 // 1. Find highest set bit in n (assume n != 0) 1999 // 2. Initialise result as x 2000 // 3. For remaining bits (0 or 1) below the highest set bit: 2001 // - square the current result 2002 // - if the current bit is 1 then multiply by x 2003 // In this scheme the factors to multiply by x can be pre-computed. 2004 2005 // Split b 2006 final double xh = highPart(x); 2007 final double xl = x - xh; 2008 2009 // Initialise the result as x^1 2010 double f0 = x; 2011 double f1 = xx; 2012 2013 double u; 2014 double v; 2015 double w; 2016 2017 // Shift the highest set bit off the top. 2018 // Any remaining bits are detected in the sign bit. 2019 final int shift = Integer.numberOfLeadingZeros(n) + 1; 2020 int bits = n << shift; 2021 2022 // Multiplication is done without object allocation of DD intermediates. 2023 // The square can be optimised. 2024 // Process remaining bits below highest set bit. 2025 for (int i = 32 - shift; i != 0; i--, bits <<= 1) { 2026 // Square the result 2027 // Inline multiply(f0, f1, f0, f1), adapted for squaring 2028 u = f0 * f0; 2029 v = twoSquareLow(f0, u); 2030 // Inline (f0, f1) = fastTwoSum(hi, lo + (2 * f0 * f1)) 2031 w = v + (2 * f0 * f1); 2032 f0 = u + w; 2033 f1 = fastTwoSumLow(u, w, f0); 2034 if (bits < 0) { 2035 // Inline multiply(f0, f1, x, xx) 2036 u = highPart(f0); 2037 v = f0 - u; 2038 w = f0 * x; 2039 v = twoProductLow(u, v, xh, xl, w); 2040 // Inline (f0, f1) = fastTwoSum(w, v + (f0 * xx + f1 * x)) 2041 u = v + (f0 * xx + f1 * x); 2042 f0 = w + u; 2043 f1 = fastTwoSumLow(w, u, f0); 2044 } 2045 } 2046 2047 return new DD(f0, f1); 2048 } 2049 2050 /** 2051 * Compute {@code this} number {@code x} raised to the power {@code n}. 2052 * 2053 * <p>The value is returned as fractional {@code f} and integral 2054 * {@code 2^exp} components. 2055 * <pre> 2056 * (x+xx)^n = (f+ff) * 2^exp 2057 * </pre> 2058 * 2059 * <p>The combined fractional part (f, ff) is in the range {@code [0.5, 1)}. 2060 * 2061 * <p>Special cases: 2062 * <ul> 2063 * <li>If {@code (x, xx)} is zero the high part of the fractional part is 2064 * computed using {@link Math#pow(double, double) Math.pow(x, n)} and the exponent is 0.</li> 2065 * <li>If {@code n = 0} the fractional part is 0.5 and the exponent is 1.</li> 2066 * <li>If {@code (x, xx)} is an exact power of 2 the fractional part is 0.5 and the exponent 2067 * is the power of 2 minus 1.</li> 2068 * <li>If the result high-part is an exact power of 2 and the low-part has an opposite 2069 * signed non-zero magnitude then the fraction high-part {@code f} will be {@code +/-1} such that 2070 * the double-double number is in the range {@code [0.5, 1)}.</li> 2071 * <li>If the argument is not finite then a fractional representation is not possible. 2072 * In this case the fraction and the scale factor is undefined.</li> 2073 * </ul> 2074 * 2075 * <p>The computed result is approximately {@code 16 * (n - 1) * eps} of the exact result 2076 * where eps is 2<sup>-106</sup>. 2077 * 2078 * @param n Power. 2079 * @param exp Result power of two scale factor (integral exponent). 2080 * @return Fraction part. 2081 * @see #frexp(int[]) 2082 */ 2083 public DD pow(int n, long[] exp) { 2084 // Edge cases. 2085 if (n == 0) { 2086 exp[0] = 1; 2087 return new DD(0.5, 0); 2088 } 2089 // IEEE result for non-finite or zero 2090 if (!Double.isFinite(x) || x == 0) { 2091 exp[0] = 0; 2092 return new DD(Math.pow(x, n), 0); 2093 } 2094 // Here the number is non-zero finite 2095 final int[] ie = {0}; 2096 DD f = frexp(ie); 2097 final long b = ie[0]; 2098 // Handle exact powers of 2 2099 if (Math.abs(f.x) == HALF && f.xx == 0) { 2100 // (f * 2^b)^n = (2f)^n * 2^(b-1)^n 2101 // Use Math.pow to create the sign. 2102 // Note the result must be scaled to the fractional representation 2103 // by multiplication by 0.5 and addition of 1 to the exponent. 2104 final double y0 = 0.5 * Math.pow(2 * f.x, n); 2105 // Propagate sign change (y0*f.x) to the original zero (this.xx) 2106 final double y1 = Math.copySign(0.0, y0 * f.x * this.xx); 2107 exp[0] = 1 + (b - 1) * n; 2108 return new DD(y0, y1); 2109 } 2110 if (n < 0) { 2111 f = computePowScaled(b, f.x, f.xx, -n, exp); 2112 // Result is a non-zero fraction part so inversion is safe 2113 f = reciprocal(f.x, f.xx); 2114 // Rescale to [0.5, 1.0) 2115 f = f.frexp(ie); 2116 exp[0] = ie[0] - exp[0]; 2117 return f; 2118 } 2119 return computePowScaled(b, f.x, f.xx, n, exp); 2120 } 2121 2122 /** 2123 * Compute the number {@code x} (non-zero finite) raised to the power {@code n}. 2124 * 2125 * <p>The input power is treated as an unsigned integer. Thus the negative value 2126 * {@link Integer#MIN_VALUE} is 2^31. 2127 * 2128 * @param b Integral component 2^exp of x. 2129 * @param x Fractional high part of x. 2130 * @param xx Fractional low part of x. 2131 * @param n Power (in [2, 2^31]). 2132 * @param exp Result power of two scale factor (integral exponent). 2133 * @return Fraction part. 2134 */ 2135 private static DD computePowScaled(long b, double x, double xx, int n, long[] exp) { 2136 // Compute the power by multiplication (keeping track of the scale): 2137 // 13 = 1101 2138 // x^13 = x^8 * x^4 * x^1 2139 // = ((x^2 * x)^2)^2 * x 2140 // 21 = 10101 2141 // x^21 = x^16 * x^4 * x^1 2142 // = (((x^2)^2 * x)^2)^2 * x 2143 // 1. Find highest set bit in n (assume n != 0) 2144 // 2. Initialise result as x 2145 // 3. For remaining bits (0 or 1) below the highest set bit: 2146 // - square the current result 2147 // - if the current bit is 1 then multiply by x 2148 // In this scheme the factors to multiply by x can be pre-computed. 2149 2150 // Scale the input in [0.5, 1) to be above 1. Represented as 2^be * b. 2151 final long be = b - 1; 2152 final double b0 = x * 2; 2153 final double b1 = xx * 2; 2154 // Split b 2155 final double b0h = highPart(b0); 2156 final double b0l = b0 - b0h; 2157 2158 // Initialise the result as x^1. Represented as 2^fe * f. 2159 long fe = be; 2160 double f0 = b0; 2161 double f1 = b1; 2162 2163 double u; 2164 double v; 2165 double w; 2166 2167 // Shift the highest set bit off the top. 2168 // Any remaining bits are detected in the sign bit. 2169 final int shift = Integer.numberOfLeadingZeros(n) + 1; 2170 int bits = n << shift; 2171 2172 // Multiplication is done without using DD.multiply as the arguments 2173 // are always finite and the product will not overflow. The square can be optimised. 2174 // Process remaining bits below highest set bit. 2175 for (int i = 32 - shift; i != 0; i--, bits <<= 1) { 2176 // Square the result 2177 // Inline multiply(f0, f1, f0, f1, f), adapted for squaring 2178 fe <<= 1; 2179 u = f0 * f0; 2180 v = twoSquareLow(f0, u); 2181 // Inline fastTwoSum(hi, lo + (2 * f0 * f1), f) 2182 w = v + (2 * f0 * f1); 2183 f0 = u + w; 2184 f1 = fastTwoSumLow(u, w, f0); 2185 // Rescale 2186 if (Math.abs(f0) > SAFE_MULTIPLY) { 2187 // Scale back to the [1, 2) range. As safe multiply is 2^500 2188 // the exponent should be < 1001 so the twoPow scaling factor is supported. 2189 final int e = Math.getExponent(f0); 2190 final double s = twoPow(-e); 2191 fe += e; 2192 f0 *= s; 2193 f1 *= s; 2194 } 2195 if (bits < 0) { 2196 // Multiply by b 2197 fe += be; 2198 // Inline multiply(f0, f1, b0, b1, f) 2199 u = highPart(f0); 2200 v = f0 - u; 2201 w = f0 * b0; 2202 v = twoProductLow(u, v, b0h, b0l, w); 2203 // Inline fastTwoSum(w, v + (f0 * b1 + f1 * b0), f) 2204 u = v + (f0 * b1 + f1 * b0); 2205 f0 = w + u; 2206 f1 = fastTwoSumLow(w, u, f0); 2207 // Avoid rescale as x2 is in [1, 2) 2208 } 2209 } 2210 2211 final int[] e = {0}; 2212 final DD f = new DD(f0, f1).frexp(e); 2213 exp[0] = fe + e[0]; 2214 return f; 2215 } 2216 2217 /** 2218 * Test for equality with another object. If the other object is a {@code DD} then a 2219 * comparison is made of the parts; otherwise {@code false} is returned. 2220 * 2221 * <p>If both parts of two double-double numbers 2222 * are numerically equivalent the two {@code DD} objects are considered to be equal. 2223 * For this purpose, two {@code double} values are considered to be 2224 * the same if and only if the method call 2225 * {@link Double#doubleToLongBits(double) Double.doubleToLongBits(value + 0.0)} 2226 * returns the identical {@code long} when applied to each value. This provides 2227 * numeric equality of different representations of zero as per {@code -0.0 == 0.0}, 2228 * and equality of {@code NaN} values. 2229 * 2230 * <p>Note that in most cases, for two instances of class 2231 * {@code DD}, {@code x} and {@code y}, the 2232 * value of {@code x.equals(y)} is {@code true} if and only if 2233 * 2234 * <pre> 2235 * {@code x.hi() == y.hi() && x.lo() == y.lo()}</pre> 2236 * 2237 * <p>also has the value {@code true}. However, there are exceptions: 2238 * 2239 * <ul> 2240 * <li>Instances that contain {@code NaN} values in the same part 2241 * are considered to be equal for that part, even though {@code Double.NaN == Double.NaN} 2242 * has the value {@code false}.</li> 2243 * <li>Instances that share a {@code NaN} value in one part 2244 * but have different values in the other part are <em>not</em> considered equal.</li> 2245 * </ul> 2246 * 2247 * <p>The behavior is the same as if the components of the two double-double numbers were passed 2248 * to {@link java.util.Arrays#equals(double[], double[]) Arrays.equals(double[], double[])}: 2249 * 2250 * <pre> 2251 * Arrays.equals(new double[]{x.hi() + 0.0, x.lo() + 0.0}, 2252 * new double[]{y.hi() + 0.0, y.lo() + 0.0}); </pre> 2253 * 2254 * <p>Note: Addition of {@code 0.0} converts signed representations of zero values 2255 * {@code -0.0} and {@code 0.0} to a canonical {@code 0.0}. 2256 * 2257 * @param other Object to test for equality with this instance. 2258 * @return {@code true} if the objects are equal, {@code false} if object 2259 * is {@code null}, not an instance of {@code DD}, or not equal to 2260 * this instance. 2261 * @see Double#doubleToLongBits(double) 2262 * @see java.util.Arrays#equals(double[], double[]) 2263 */ 2264 @Override 2265 public boolean equals(Object other) { 2266 if (this == other) { 2267 return true; 2268 } 2269 if (other instanceof DD) { 2270 final DD c = (DD) other; 2271 return equals(x, c.x) && equals(xx, c.xx); 2272 } 2273 return false; 2274 } 2275 2276 /** 2277 * Gets a hash code for the double-double number. 2278 * 2279 * <p>The behavior is the same as if the parts of the double-double number were passed 2280 * to {@link java.util.Arrays#hashCode(double[]) Arrays.hashCode(double[])}: 2281 * 2282 * <pre> 2283 * {@code Arrays.hashCode(new double[] {hi() + 0.0, lo() + 0.0})}</pre> 2284 * 2285 * <p>Note: Addition of {@code 0.0} provides the same hash code for different 2286 * signed representations of zero values {@code -0.0} and {@code 0.0}. 2287 * 2288 * @return A hash code value for this object. 2289 * @see java.util.Arrays#hashCode(double[]) Arrays.hashCode(double[]) 2290 */ 2291 @Override 2292 public int hashCode() { 2293 return 31 * (31 + Double.hashCode(x + 0.0)) + Double.hashCode(xx + 0.0); 2294 } 2295 2296 /** 2297 * Returns {@code true} if the values are numerically equal. 2298 * 2299 * <p>Two {@code double} values are considered to be 2300 * the same if and only if the method call 2301 * {@link Double#doubleToLongBits(double) Double.doubleToLongBits(value + 0.0)} 2302 * returns the identical {@code long} when applied to each value. This provides 2303 * numeric equality of different representations of zero as per {@code -0.0 == 0.0}, 2304 * and equality of {@code NaN} values. 2305 * 2306 * @param x Value 2307 * @param y Value 2308 * @return {@code true} if the values are numerically equal 2309 */ 2310 private static boolean equals(double x, double y) { 2311 return Double.doubleToLongBits(x + 0.0) == Double.doubleToLongBits(y + 0.0); 2312 } 2313 2314 /** 2315 * Returns a string representation of the double-double number. 2316 * 2317 * <p>The string will represent the numeric values of the parts. 2318 * The values are split by a separator and surrounded by parentheses. 2319 * 2320 * <p>The format for a double-double number is {@code "(x,xx)"}, with {@code x} and 2321 * {@code xx} converted as if using {@link Double#toString(double)}. 2322 * 2323 * <p>Note: A numerical string representation of a finite double-double number can be 2324 * generated by conversion to a {@link BigDecimal} before formatting. 2325 * 2326 * @return A string representation of the double-double number. 2327 * @see Double#toString(double) 2328 * @see #bigDecimalValue() 2329 */ 2330 @Override 2331 public String toString() { 2332 return new StringBuilder(TO_STRING_SIZE) 2333 .append(FORMAT_START) 2334 .append(x).append(FORMAT_SEP) 2335 .append(xx) 2336 .append(FORMAT_END) 2337 .toString(); 2338 } 2339 2340 /** 2341 * {@inheritDoc} 2342 * 2343 * <p>Note: Addition of this value with any element {@code a} may not create an 2344 * element equal to {@code a} if the element contains sign zeros. In this case the 2345 * magnitude of the result will be identical. 2346 */ 2347 @Override 2348 public DD zero() { 2349 return ZERO; 2350 } 2351 2352 /** {@inheritDoc} */ 2353 @Override 2354 public boolean isZero() { 2355 // we keep |x| > |xx| and Java provides 0.0 == -0.0 2356 return x == 0.0; 2357 } 2358 2359 /** 2360 * {@inheritDoc} 2361 * 2362 * <p>Note: Multiplication of this value with any element {@code a} may not create an 2363 * element equal to {@code a} if the element contains sign zeros. In this case the 2364 * magnitude of the result will be identical. 2365 */ 2366 @Override 2367 public DD one() { 2368 return ONE; 2369 } 2370 2371 /** {@inheritDoc} */ 2372 @Override 2373 public boolean isOne() { 2374 return x == 1.0 && xx == 0.0; 2375 } 2376 2377 /** 2378 * {@inheritDoc} 2379 * 2380 * <p>This computes the same result as {@link #multiply(double) multiply((double) y)}. 2381 * 2382 * @see #multiply(double) 2383 */ 2384 @Override 2385 public DD multiply(int n) { 2386 // Note: This method exists to support the NativeOperators interface 2387 return multiply(x, xx, n); 2388 } 2389}