Medical Statistics. David Machin
Чтение книги онлайн.
Читать онлайн книгу Medical Statistics - David Machin страница 16
![Medical Statistics - David Machin Medical Statistics - David Machin](/cover_pre843611.jpg)
Categorical or Qualitative Data
Nominal Categorical Data
Nominal or categorical data are data that one can name and put into categories. They are not measured but simply counted. They often consist of unordered ‘either‐or’ type observations that have two categories and are often know as binary. For example: dead or alive; male or female; cured or not cured; pregnant or not pregnant. In Table 2.1 gender is a binary variable. However, categorical data often can have more than two categories, for example: blood group O, A, B, AB, country of origin, ethnic group or social class. The methods of presentation of nominal data are limited in scope. Thus Table 2.1 gives the number and percentage of people treated at each of the seven centres in each of the two randomised groups. Categorical data is sometimes referred to as ‘qualitative’, to distinguish it from ‘quantitative’ which we will discuss later. However, there is a whole area of methodology called ‘qualitative research’ and so to avoid confusion we will not us this term.
Ordinal Data
If there are more than two categories of classification it may be possible to order them in some way. For example, after treatment a patient may be either improved, the same or worse. In Table 2.1 smoking history is given in three categories: non‐smoker, previous smoker, and current smoker. Thus, someone who is a current smoker has more recent exposure to tobacco than someone who is an ex‐smoker and someone who has never smoked. However, without further knowledge (of the current and past levels of tobacco consumption) it would be wrong to ascribe a numerical quantity to the category, for example, non‐smoker = 0; previous smoker = 1; current smoker = 2, as one cannot say that someone who is a current smoker has twice the levels of tobacco consumption as someone who is a previous smoker. This type of data is also known as ordered categorical or ordinal data.
Ranks
In some studies it may be appropriate to assign ranks. For example, patients with corns may be asked to order their preference for treatment, for example, hard skin (corn) removal by scalpel; special rehydration creams for thickened skin; customised soft padding or foam insoles; corn plaster containing salicylic acid. Here although numerical values from 1 to 4 may be assigned to each treatment we cannot treat them as numerical values. They are in fact only codes for best, second best, third choice, and worst.
Numerical or Quantitative Data
Count Data
Table 2.1 gives details of the number of corns each participant had at the start of the trial, since this can only be a whole number or integer value, for example, 0, 1, 2, or 3 in this trial, this is termed count data. Other examples are often counts per unit of time such as the number of deaths in a hospital per year, or the number of attacks of asthma a person has per month. In dentistry, a common measure is the number of decayed, filled or missing teeth (DFM).
Measured or Numerical Continuous
Such data are measurements that can, in theory at least, take any value within a given range. These data contain the most information, and are the ones most commonly used in statistics. Examples of continuous data in Table 2.1 are age, size of index corn, visual analogue scale (VAS), pain score and EQ‐5D tariff.
However, for simplicity, it is often the case in medicine that continuous data are dichotomised to make binary data. Thus, diastolic blood pressure, which is continuous, is converted into hypertension (>90 mmHg) and normotension (≤90 mmHg). This clearly leads to a loss of information. There are two main reasons for dichotomising data. It is easier to describe a population by the proportion of people affected, for example, the proportion of people in the population with hypertension is 10%. Further one often has to make a decision: if a person has hypertension, then they will get treatment, and this too is easier if high blood pressure has been categorised.
One can also divide a continuous variable into more than two groups. For example, we could divide age into age bands of equal lengths of, say 10 years such as: 0–9; 10–19; 20–29, etc. When categorising continuous data authors should give an indication as to why they chose these cut‐off points, and a reader has to be very wary to guard against the fact