Using Stata for Quantitative Analysis. Kyle C. Longest

Чтение книги онлайн.

Читать онлайн книгу Using Stata for Quantitative Analysis - Kyle C. Longest страница 7

Автор:
Серия:
Издательство:
Using Stata for Quantitative Analysis - Kyle C. Longest

Скачать книгу

about the cases, which means you would merge on the ids variable. It is the ids variable that links the original data to the new data. The second situation, however, would require that you merge on the religoth variable because it is the link between the two data sets. You may have realized that doing the latter means that several cases in your combined data will have the exact same values for the new denomination-based variables. That is, every respondent that identifies as Baptist will receive the exact same value for the totalmembers and evangelical variables. This commonality is exactly what you are looking for when you incorporate this type of information.

      Once you have identified the variable that you will merge the two data sets with (i.e., which variable allows you to link to the two data sets), the –merge- command is relatively straightforward. Again, following along with the Stata Help Files section of Chapter 8 will help you understand exactly how to complete this combination for your particular needs. Again, it may be helpful here to see what the final product looks like to have a better sense of exactly what the –merge- command does and whether it may be what you need. Figures 1.12 and 1.13 display the final data after completing a merge first with the post-test data shown in Figure 1.10 and then completing a different merge with the denomination data from Figure 1.11.

jpg

      FIGURE 1.12 • NEW MERGED DATA WITH NEW OBSERVATIONS

jpg

      FIGURE 1.13 • NEW MERGED DATA WITH DENOMINATION INFORMATION

      In this example, you can see that the final data set still contains the original 10 cases, but now the information from their follow-up survey is connected to their original responses. Again, some information (i.e., gender) has remained constant, whereas other data have altered as their lives have presumably changed.

      In this merge example, the same original cases are present, but information that pertains to their response in the religoth variable is now included. Because the data set with information about each denomination did not include some of the particular denominations that the respondents reported, several cases now have missing information on these new variables. But the new information that is provided may be helpful in analyzing why belonging to specific denominations may be related to particular behaviors or trajectories.

      Types of Variables in Data Files

      At this point, you should feel comfortable with the basic structure of data files. Each row holds the information for one case and each column is a different variable. With this knowledge, you are almost ready to start analyzing your data. There is, however, one distinction in the types of variables included in data that is important to understand.

      To help illustrate this difference, consider the NSYR variable gender in the Chapter 1 Data.dta file. This variable came from the following question asked of all respondents:

      Are you

      1 Male?

      2 Female?

      If you were entering the responses to this question into a Stata data set, you could record them in one of two ways. First, the actual answer “Male” or “Female” could be recorded for each case. Second, you could use a number to represent each answer. For example, you could choose to enter 0 for all respondents reporting “Male” and 1 for all respondents reporting “Female.”

      If you record the responses in the first way, it would be what Stata refers to as a string variable. A string variable is a variable in which the contents are actual words. String variables can be very useful for many purposes. For example, you can enter verbatim answers to questions directly into Stata, as was done for the variable religoth in the Chapter 1 Data.dta file.

      The drawback of storing a variable such as gender as a string variable is that some statistical operations require numbers. For example, if you wanted to calculate the mean (i.e., mathematical average) of a variable, each category must be assigned a numeric value. For this reason, it is generally advisable, when possible, to use the second method and enter variables as numeric variables. These are variables that have actual numbers attached to each response.

      Fortunately, many of the Stata commands that will be discussed in this book operate similarly with numeric or string variables. The commands that work only with numeric variables are those that perform statistical operations that require numbers to calculate, for example, the mean or a linear regression. Because numeric variables, typically, are more applicable to the vast majority of data analyses, the commands discussed in this book focus on their use with numeric variables (keeping in mind that many operate identically for string variables). The primary commands that are used (and are different) for string variables, including methods for changing a string variable to a numeric variable, are addressed in the Data Management: Using String Variables section in Chapter 3.

      As has been discussed, often, you may be using data that you did not enter, so you may not have a choice or even be certain about the way in which variables were entered. There are several ways to determine whether a variable is a numeric or string variable. The most straightforward way is to open the Data Browser window. In versions Stata 10 or later, string variables are shown in a red font, whereas numeric variables are shown in either a black or blue font. In the Chapter 1 Data.dta file, you will see that only the variable religoth is a string variable.

      Another option to see which variables are string variables is to click on a particular variable in the Variables window. In the Properties window, you will see an entry for Type. When the variable type starts with the letters “str,” the variable is stored as a string variable.

      jpg A CLOSER LOOK: VARIABLE TYPES

      You may have noticed that more information about the variable type is listed in the Properties window. For example, gender is shown to be a byte variable, ids is a long variable, and religoth is a str31 variable.

      These distinctions further demarcate variables within the general categories of numeric and string. They are also related to how much file space is allotted to store the variable.

      All string variables have the “str” prefix, and the number indicates the maximum characters that can be used for that string variable. So the maximum length a denomination could be in the variable religoth is 31 characters. As you will see, this constraint can be altered, but it is advisable to use only the minimum number of characters that are needed. Otherwise you are using memory to store empty spaces.

      Similarly, the various subtypes of numeric variables indicate the number of digits that each variable can hold. In order from the smallest to the largest, the numeric variable types are byte, int, long, float, and double.

      Generally, Stata will store variables in the most efficient and effective way when you create them. Moreover, most Stata users will conduct countless analyses without ever having to worry about or manipulate these specific distinctions.

      When you have the Data Browser open, you will probably notice, however, that the variables gender and employst look different from the variables ids and agecats. This difference is due to the fact that gender and employst have what are called value labels attached to them. Value labels will be covered in much more detail later, but they are labels that can be applied to the numeric

Скачать книгу