Discovering Partial Least Squares with JMP. Marie Gaudard A.

Чтение книги онлайн.

Читать онлайн книгу Discovering Partial Least Squares with JMP - Marie Gaudard A. страница 4

Автор:
Жанр:
Серия:
Издательство:
Discovering Partial Least Squares with JMP - Marie Gaudard A.

Скачать книгу

book uses JMP Pro 11.0 in screenshots, instructions, and discussions. Even though JMP’s PLS capabilities will continue to be developed, the major features and design shown here will persist. However, in future versions, you may notice very slight differences from the specific instruction sequences and screenshots presented in this book.

      Ideally, you will have JMP Pro 11 available as you work through this book. A fully functional version of JMP Pro 11 that runs for 30 days can be requested at http://www.jmp.com/webforms/jmp_pro_eval.shtml.

      The standard version of JMP enables you to run some partial least squares analyses through a simplified interface. Using this version you will be able to work through some, but not all, of the examples, and many of the scripts linked to in the book will not function correctly. But the book should still help your understanding of partial least squares, and help you decide if you need the Pro version of JMP.

      The data tables and scripts associated with the book can be accessed at either http://support.sas.com/cox or http://support.sas.com/gaudard, which provides a single ZIP file. Once downloaded, you can unzip the contents to a convenient location on your hard disk. This process creates a master JMP journal file Discovering Partial Least Squares with JMP.jrn, along with a folder for each chapter containing scripts. Data tables are created by running these scripts using the links in the master journal. The master journal file provides a convenient way to access all of the supplementary content, and the instructions in the text assume that you will do this.

      The data tables themselves contain saved scripts that are referred to in the chapters. Often, when working through an example, we show the steps that you can follow to generate a report in JMP. In addition, either parenthetically or directly, we give the name of a script that has been saved to the data table and that generates that same analysis.

      This way, if you want to see the report without stepping through the selections to create it, you can simply run that script.

      The scripts are used to illustrate concepts and to help you develop understanding. Because many of the scripts have an element of randomness built in, it is usually worth running the same script more than once to see the effect over various random choices. Also, be aware that the scripts have been encrypted. If you open one of these scripts directly rather than via the journal file mentioned earlier, you see what appears to be gibberish. Nevertheless, you can right-click within the script window and select Run Script.

      1

      Introducing Partial Least Squares

       Modeling in General

       Partial Least Squares in Today’s World

       Transforming, and Centering and Scaling Data.

       An Example of a PLS Analysis.

       The Data and the Goal

       The Analysis.

       Testing the Model

      Applied statistics can be thought of as a body of knowledge, or even a technology, that supports learning about the real world in the face of uncertainty. The theme of learning is ubiquitous in more or less every context that can be imagined, and along with this comes the idea of a (statistical) model that tries to codify or encapsulate our current understanding.

      Many statistical models can be thought of as relating one or more inputs (which we call collectively X) to one or more outputs (collectively Y). These quantities are measured on the items or units of interest, and models are constructed from these observations. Such observations yield quantitative data that can be expressed numerically or coded in numerical form.

      By the standards of fundamental physics, chemistry, and biology, at least, statistical models are generally useful when current knowledge is moderately low and the underlying mechanisms that link the values in X and Y are obscure. So although one of the perennial challenges of any modeling activity is to take proper account of whatever is already known, the fact remains that statistical models are generally empirical in nature. This is not in any sense a failing, since there are many situations in research, engineering, the natural sciences, the physical sciences, life science, behavioral science, and other areas in which such empirical knowledge has practical utility or opens new, useful lines of inquiry.

      However, along with this diversity of contexts comes a diversity of data. No matter what its intrinsic beauty, a useful model must be flexible enough to adequately support the more specific objectives of prediction from or explanation of the data presented to it. As we shall see, one of the appealing aspects of partial least squares as a modeling approach is that, unlike some more traditional approaches that might be familiar to you, it is able to encompass much of this diversity within a single framework.

      A final comment on modeling in general—all data is contextual. Only you can determine the plausibility and relevance of the data that you have, and you overlook this simple fact at your peril. Although statistical modeling can be invaluable, just looking at the data in the right way can and should illuminate and guide the specifics of building empirical statistical models of any kind (Chatfield 1995).

      Increasingly, we are finding data everywhere. This data explosion, supported by innovative and convergent technologies, has arguably made data exploration (e-Science) a fourth learning paradigm, joining theory, experimentation, and simulation as a way to drive new understanding (Microsoft Research 2009).

      In simple retail businesses, sellers and buyers are wrestling for more leverage over the selling/buying process, and are attempting to make better use of data in this struggle. Laboratories, production lines, and even cars are increasingly equipped with relatively low-cost instrumentation routinely producing data of a volume and complexity that was difficult to foresee even thirty years ago. This book shows you how partial least squares, with its appealing flexibility, fits into this exciting picture.

      This abundance of data, supported by the widespread use of automated test equipment, results in data sets with a large number of columns, or variables, v and/or a large number of observations, or rows, n. Often, but not always, it is cheap to increase v and expensive to increase n.

      When the interpretation of the data permits a natural separation of variables into predictors and responses, partial least squares, or PLS for short, is a flexible approach to building statistical models for prediction. PLS can deal effectively with the following:

      • Wide data (when v >> n, and v is large or very large)

      • Tall

Скачать книгу