SAS Statistics by Example. Ron Cody, EdD

Чтение книги онлайн.

Читать онлайн книгу SAS Statistics by Example - Ron Cody, EdD страница 7

Автор:
Жанр:
Серия:
Издательство:
SAS Statistics by Example - Ron Cody, EdD

Скачать книгу

      Amazingly enough, this is a complete SAS program. Notice that each statement in this two-line SAS program ends in a semicolon. When you write SAS programs, you can use as many lines as you want to write a statement; you can even put more than one statement on a line (though this is not recommended for stylistic reasons). The semicolon is the logical end of a SAS statement. You are free to add extra spaces on a line or place extra blank lines in your program to make it more readable.

      To run this program from Display Manager, click the Submit icon:

Image480.png

      Here is the output you get from running Program 1.1:

Image488.png

      At the top of the three right-most columns, you see the SAS variable names—the same names that were stored in the first row of your workbook. The first column, labeled Obs (short for Observations), was generated by SAS and shows the observation number.

      Each row of the listing represents a row from the workbook.

      Next, let’s see how to display the data descriptor portion of this data set. Program 1.2 is one way to do this:

title “Displaying the Descriptor Portion of a SAS Data Set”; proc contents data=SampleData; run;

      Notice that I have added a TITLE statement to this program. With a TITLE statement, you can enter a title that will print across the top of every page of output. TITLE statements are in a class of SAS statements known as GLOBAL statements. The title that you enter stays in effect for the remainder of your SAS session, unless you replace it with another TITLE statement. To remove all titles from your output, submit a null title statement like this:

      title;

      When you submit Program 1.2, you will see the following output:

Image496.png Image515.png

      The first two lines of output show that the data set name is SAMPLEDATA. (The full name is WORK.SAMPLEDATA. The prefix WORK. tells SAS that this is a temporary SAS data set.) Also shown in these lines are the number of observations (5) and the number of variables (3). Let’s skip down to the portion of the output labeled Alphabetic List of Variables and Attributes. Here you see that the variables Age and ID are stored as numeric types and Gender is stored as a character type.

      SAS has only two variable types: numeric and character. By default, all numeric values are stored in 8 bytes, allowing for approximately 15 significant figures, depending on your operating system. Character values are stored 1 byte per character and can be from 1 to 32,767 bytes in length.

      SAS data sets can be either temporary or permanent. A temporary SAS data set is one that exists for the duration of your SAS session but is not saved when you exit SAS. Permanent SAS data sets, as the name implies, remain when you exit SAS and can be accessed in future SAS sessions. The Import Wizard example discussed previously used the Work library. Choosing the Work library caused the SAS data set SAMPLEDATA to be a temporary data set.

      SAS data set identifiers are divided into two parts, separated by a period. The part before the period is called a library reference (libref for short) and identifies the folder where SAS has stored the data set. The part following the period is the data set name. Both parts of this identifier must satisfy the naming conventions mentioned earlier.

      For example, if your data set is called SURVEY and is stored in a library called MYDATA, SAS uses the following notation to identify the file:

      mydata.survey

      If you wanted to put this file on your disk drive in the C:\MYSASFILES folder, you would write a statement called a LIBNAME statement that associates the c:\sasfiles folder with the MYDATA library reference, like this:

      libname mydata “’c:\mysasfiles”’;

      If you have your data in a text file, SAS can read the text file and create a SAS data set. The text file can contain either data values separated by delimiters or data values in fixed columns.

      SAS can read data values from a text file in which each value is separated from the next value by a delimiter. By default, SAS expects one or more spaces between data values. However, it is easy to specify other delimiters, such as commas. Let’s start by reading a small text file in which spaces are used as delimiters. Here’s a listing of this file:

      Raw Data with Blanks as Delimiters: File c:\books\Statistics by Example\delim.txt

1 23 M 2 33 F 3 18 F 4 45 M 5 41 M 6 . F

      In this file, the three data values on each line represent an ID number, Age, and Gender, respectively. Before you write a SAS program to read this text file, notice that ID = 6 has a missing value for her age. Because you have delimited data, you need a way to specify that the Age value is missing for that subject. When you have blanks as delimiters, you can use a period to specify that you have a missing value. In the next example, which uses a CSV file, you do not need to use periods for missing values.

      Program 1.3 will read this text file and create a SAS data set called Sample2:

data Sample2; infile “’c:\books\statistics by example\delim.txt”’; length Gender $ 1; input ID Age Gender $; run;

      The INFILE statement tells SAS where to look for the text file. Following the keyword INFILE, you place the filename in single or double quotes. The LENGTH statement tells SAS that the variable Gender is character (the dollar sign indicates this) and that you want to store Gender in 1 byte (the 1 indicates this). The INPUT statement lists the variable names in the same order as the values in the text file. Because you already told SAS that Gender is a character variable, the dollar sign following the name Gender on the INPUT statement is not necessary. If you had not included a LENGTH statement, the dollar sign following Gender on the INPUT statement would have been necessary. SAS assumes variables are numeric unless you tell it otherwise.

      The RUN statement ends the program. Because this program starts with the keyword DATA, it is called a DATA step. The previous two programs demonstrated PROC steps. SAS programs are typically made up of DATA and PROC steps. Each step ends with a RUN statement.

      As you did earlier, you can use PROC PRINT to list the observations in the Sample2 data set (as shown in Program 1.4):

title “Listing of Data Set Sample2”; proc print data=Sample2; run;

      Here is the listing:

Скачать книгу