Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Modern regression methods (2nd ed.)
Ryan T., Wiley-Interscience, New York, NY, 2008. 642 pp. Type: Book (9780470081860)
Date Reviewed: Jul 13 2009

One of the main objectives of science is to predict a future value y of a physical quantity. For this prediction, we must know how y depends on the current and past values, x, of this (and other) quantities.

In some situations, we know the exact equations that relate these quantities. For example, Newton’s equations enable us to predict the future position of a celestial body, based on its current position and velocity. However, in many other practical situations, we do not know these equations. In these situations, it is desirable to determine the equations based on previous observations of all the quantities. In this determination, we must take into account that the observations are never absolutely accurate--there is some measurement uncertainty. Determining the dependence between variables, based on their imprecise observations, is what constitutes regression.

Most engineers and scientists are familiar with linear regression y = a + b × x + (...). Usually, it is assumed that the measurement errors are independent and normally distributed, with zero mean and the same standard deviation s. Under these assumptions, the standard least squares method--minimizing the sum of the squares of the differences y - (a + b × x + ...)--provides the best estimate for the regression coefficients (a, b, ...). Equating all partial derivatives of the corresponding quadratic expression to zero results in an easy-to-solve system of linear equations for the unknown (a, b, ...). Standard statistical techniques help to estimate the accuracy of these coefficients and of the resulting predictions. These basic techniques are covered in chapters 1 and 3.

In practice, the assumptions of independence and normality are not always satisfied. Chapters 2 and 4 describe how to check these assumptions, including what to do if they are not satisfied. Checking is reasonably easy--once we find the parameters (a, b, ...), we can then apply the standard statistical tests for checking whether the differences e(t)= y(t) - (a + b × x(t) + ...) are independent and normally distributed. Handling the situations where these assumptions are not satisfied is more difficult. For example, the measurement errors e(t) and e(t-1) at different moments in time are often correlated; a reasonable way to deal with this correlation is to assume that e(t) can be predicted by a linear regression e(t) = c × e(t-1) + n(t), with independent normally distributed errors n(t). Similarly, an economically important heteroscedastic case--when the standard deviation s(t) changes with time--can also be handled by assuming a regression formula for s(t).

An important practical problem is the presence of outliers--points at which, for example, the measurement instrument malfunctioned. Outliers can spoil the resulting statistics; for example, the average of 1,000 values of size one is distorted if we add an outlier of size 10,000. To deal with such situations, it is necessary to make statistical methods robust, relative to such outliers (chapter 11).

Another important problem is nonlinearity. In a small neighborhood, an arbitrary smooth dependence can be approximated well by the linear terms in a Taylor expansion. However, in general, nonlinearity is important. In many cases, this nonlinearity can be reduced to a linear dependence if we apply an appropriate nonlinear rescaling to x and/or y. For example, a power law is equivalent to a linear dependence between ln(y) and ln(x). In other cases, we need polynomial terms of higher order (chapter 8). For an important case of approximately periodic processes, trigonometric terms are more reasonable. In many practical situations, logistic dependence works well; corresponding methods are described in chapter 9. Nonsmooth dependencies require piecewise regression techniques, such as splines (chapter 10). The separating points in piecewise techniques are an important example of parameters relative to which the model is nonlinear. For these parameters, the least squares method is no longer reducible to a system of linear equations, so more sophisticated optimization techniques are needed. General methods are described in chapter 13, and specific methods for frequent models (such as probit) in chapter 15, a chapter specifically written for this new edition.

To predict y, we consider all possible variables (x, ...) that may affect y. Often, the contributions of some of them turn out to be negligible, so we need to select a subset of these parameters (chapter 7). Another frequent problem is that the variables (x, z, ...) turn out to be dependent on each other x = f(z, ...).

Often, by using some of these methods, we can predict y with the desired accuracy. However, sometimes, the model accuracy is not sufficient; in this case, to improve the accuracy, we must perform additional measurements. Chapter 14 describes the best way to design the corresponding experiments.

Finally, chapter 16 applies all these techniques to several real-life examples--some of which were partly described before--such as water quality, predicting lifespan, and leukemia data.

In his descriptions, Ryan emphasizes the computational aspects, by describing the corresponding algorithms and software packages.

The style is reasonably informal, with just enough rigor to be successful in applications. The book is well written and has many exercises. It can serve as a very good textbook for scientists and engineers, with only basic statistics as a prerequisite. I also highly recommend it to practitioners who want to solve real-life prediction problems.

Reviewer:  V. Kreinovich Review #: CR137089 (1006-0554)
Bookmark and Share
  Featured Reviewer  
 
Correlation And Regression Analysis (G.3 ... )
 
 
Robust Regression (G.3 ... )
 
 
Numerical Algorithms And Problems (F.2.1 )
 
Would you recommend this review?
yes
no
Other reviews under "Correlation And Regression Analysis": Date
Strong convergence of estimators in nonlinear autoregressive models
Liebscher E. Journal of Multivariate Analysis 84(2): 247-261, 2003. Type: Article
Aug 27 2003
Statistical inference
Rohatgi V., Dover Publications, Incorporated, 2003.  948, Type: Book (9780486428123)
Jan 23 2004
Using GMDH for modeling economical indices of mine opening
Sarycheva L. Systems Analysis Modelling Simulation 43(10): 1341-1349, 2003. Type: Article
Mar 2 2004
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy