Applied Numerical Methods Using MATLAB. Won Y. Yang
Чтение книги онлайн.
Читать онлайн книгу Applied Numerical Methods Using MATLAB - Won Y. Yang страница 22
(3) Basic normalized range (with the value of hidden bit bh = 1)(1.2.6.3a) (1.2.6.3b)
(4) The largest normalized range (with the value of hidden bit bh = 1)(1.2.6.4a) (1.2.6.4b)
(5) ± ∞(inf) with Exp = 211 − 1 = 2047, E = Exp − 1023 = 1024 (meaningless)
From what has been mentioned earlier, we know that the minimum and maximum positive numbers are, respectively,
(1.2.7a)
(1.2.7b)
where the three MATLAB constants, i.e. eps
, realmin
, and realmax
, represent 2−52, 2−1022, and (2 − 2−52) × 21023, respectively. This can be checked by running the script “nm109.m” in Section 1.I..
Now, in order to gain some idea about the arithmetic computational mechanism, let us see how the addition of two numbers, 3 and 14, represented in the IEEE 64‐bit floating number system, is performed.
In the process of adding the two numbers illustrated in Figure 1.6, an alignment is made so that the two exponents in their 64‐bit representations equal each other; and it will kick out the part smaller by more than 52 bits, causing some numerical error. For example, adding 2−23 to 230 does not make any difference, while adding 2−22 to 230 does, as we can see by typing the following statements into the MATLAB Command window.
Figure 1.6 Process of adding two numbers, 3 and 14, in MATLAB.
>x=2̂30; x+2̂-22==x, x+2̂-23==x ans= 0(false) ans= 1(true)
1 (cf) Each range has a different minimum unit (LSB value) described by Eq. (1.2.5). It implies that the numbers are uniformly distributed within each range. The closer the range is to 0, the denser the numbers in the range are. Such a number representation makes the absolute quantization error large/small for large/small numbers, decreasing the possibility of large relative quantization error.
1.2.2 Various Kinds of Computing Errors
There are various kinds of errors that we encounter when using a computer for computation.
Truncation error: Caused by adding up to a finite number of terms, while we should add infinitely many terms to get the exact answer in theory.
Round‐off error: Caused by representing/storing numeric data in finite bits.
Overflow/underflow: Caused by too large or too small numbers to be represented/stored properly in finite‐bits, more specifically, the numbers having absolute values larger/smaller than the maximum ( fmax)/minimum ( fmin) number that can be represented in MATLAB.
Negligible addition: Caused by adding two numbers of magnitudes differing by over 52 bits, as can be seen in the last section.
Loss of significance: Caused by a ‘bad subtraction’, which means a subtraction of a number from another one that is almost equal in value.
Error magnification: Caused and magnified/propagated by multiplying/dividing a number containing a small error with a large/small number.
Errors depending on the numerical algorithms, step size, and so on.
For all that we cannot be free from these kinds of inevitable errors in some degree, it is not computers, but we, human beings, who must be responsible the computing errors. While our computer may insist on its innocence for an unintended lie, we programmers and users cannot escape from the responsibility of taking measures against the errors and would have to pay for being careless enough to be deceived by a machine. We should, therefore, try to decrease the errors and minimize their impact on the final results. In order to do so, we must know the sources of computing errors and also grasp the computational properties of numerical algorithms.
For instance, consider the following two formulas:
(1.2.8)
These are theoretically equivalent, whence we expect them to give exactly the same value. However, running the following MATLAB script “nm122.m” to compute the values of the two formulas, we see a surprising result that, as x increases, the step of f1(x) incoherently moves hither and thither, while f2(x) approaches 1/2 at a steady pace. We might feel betrayed by the computer and have a doubt about its reliability. Why does such a flustering thing happen with f1(x)? It is because the number of significant bits abruptly decreases when the subtraction
These two numbers have 52 significant bits, or equivalently 16 significant digits (252 ≈ 1052×3/10 ≈ 1015) so that their significant digits range from 108 to 10−8. Accordingly, the least significant digit (LSD) of their sum and difference is also the eighth digit after the decimal point (10−8).
Note that the number of significant digits of the difference decreased to 1 from 16. Could you imagine that a single subtraction may kill most of the significant digits? This is the very ‘loss of significance’, which is often called ‘catastrophic cancellation’.
%nm122.m f1=@(x)sqrt(x)*(sqrt(x+1)-sqrt(x)); f2=@(x)sqrt(x)./(sqrt(x+1)+sqrt(x)); x=1; format long e for k=1:15 fprintf('At x=%15.0f, f1(x)=%20.18f, f2(x)=%20.18f', x,f1(x),f2(x)); x= 10*x; end sx1=sqrt(x+1); sx=sqrt(x); d=sx1-sx; s=sx1+sx; fprintf('sqrt(x+1)=%25.13f, sqrt(x)=%25.13f ',sx1,sx); fprintf(' diff=%25.23f, sum=%25.23f ',d,s);
>nm122 At x= 1, f1(x)=0.414213562373095150, f2(x)=0.414213562373095090 At x= 10, f1(x)=0.488088481701514750, f2(x)=0.488088481701515480 At x= 100, f1(x)=0.498756211208899460, f2(x)=0.498756211208902730 At x= 1000, f1(x)=0.499875062461021870, f2(x)=0.499875062460964860 At x= 10000, f1(x)=0.499987500624854420, f2(x)=0.499987500624960890 At x= 100000, f1(x)=0.499998750005928860, f2(x)=0.499998750006249940 At x= 1000000, f1(x)=0.499999875046341910, f2(x)=0.499999875000062490 At x= 10000000, f1(x)=0.499999987401150920, f2(x)=0.499999987500000580 At x= 100000000, f1(x)=0.500000005558831620, f2(x)=0.499999998749999950 At x= 1000000000, f1(x)=0.500000077997506340, f2(x)=0.499999999874999990