STANDARDIZATION OF INDICES.
DIRECT METHOD OF STANDARDIZATION.
Standardization
Comparison of indices in totalities, which differs by their structure, needs their standardization that means correction on condition that the structure of totalities will be taken to the unique standard.
The following quantities are used in medical statistics:
· absolute – the absolute quantities of the phenomenon, environment are represented
· average – the variant type of characteristic distributing are represented
· relative – the alternative type of characteristic distributing are represented
Intensive index – shows the level, expansion (spread) of the phenomenon; it is used for the comparison of two and more statistical totalities, which are in different in amount.
ІІ = 
the absolute quantities of the phenomenon 
× 1000 
the absolute quantities of the environment 
Example: environment – 11 students (statistical totality)
phenomenon morbidity: a caries – 5 students
a goitre is 3 students
gastritis is 4 students
ІІ = _{}‰
environment – 40 schoolboys
morbidity – 15 schoolboys ІІ =_{}‰
The method of standardization is used in the case, when the environments are heterogeneous (by age, sex...)
Standardization is the method of calculation of conditional (standardized) indices.
The essence of standardization method of indices consists in the calculation of conditional (standardized) indices, which substitute the intensive or other quantities in those cases, when comparison of these indices is complicated through the impossibility of comparison of groups structure.
The standardized indices are conditional, because they indicate, what these indices were, if the influence of this factor that interferes their comparison, was absent accessory removing the influence of this or that factor on the veritable (real) indices. The standardized indices can be used only with the purpose of comparison, because they don’t give imagination about the real sizes of the phenomenon.
There are different methods of calculation of the standardized indices. The most widespread method is the direct one.
The direct method of standardization is used at:
а) considerable divergences of levels of group indices (for example, different levels of lethality in hospitals or departments, different levels of morbidity for men and women, and others);
б) considerable heterogeneity of totalities, which are compared.
The standardized indices show, what were the veritable indices, if the influence of some certain factor was not present. They allow to level any influence on the indices.
Name the stages of direct method of standardization.
I st stage is the calculation of general intensive indices (or averages) for the pair of totalities, which are compared;
ІІnd stage – choice and calculation of standard. As a standard they most frequently use the halfsum of two groups (totalities), which are compared;
ІІІrd stage is the calculation of „expected quantities” in every group of standard;
The forth ІV stage is the determination of the standardized indices;
The fifth V stage is the comparison of groups according to the intensive and standardized indices.
Conclusions.
In the conclusions it must be noted that the standardized index  is the conditional index, which answer only the question – what was the level of the phenomenon that is studied, if the conditions of its origin were standard.
The ordinary intensive indices, characterizes the level, frequency of the phenomenon, because they are true and may change depending on the size of the taken standard.
Table
Standardization of indexes

Methods of Standardization










Direct


Indirect 

Reverse 
The stages of direct method of standardization:
1. Calculation of general intensive (or average) indices in compared groups.
2. Choice and calculation of the standard.
3. Calculation of “expected” figures in every group of the standard.
4. Determination of standardized indices.
5. Comparation of simple intensive and standardized indices. Conclusions.
Usage of standardized indices:
1. Comparative evaluation of demographic indices in different age and social groups.
2. Comparative analysis of morbidity in different age and social groups.
3. Comparative evaluation of treatment quality in hospitals with different content of patients in departments.
The types of values which exist in science:
· absolute – the absolute size of the phenomenon, environment are represented
· average – the variant type of signs distribution are represented
· relative – the alternative type of signs distribution are represented
Table 2.14 Example. Average duration of treatment in the hospitals
Department Hospital №1 Hospital №2 

Number Bed days The period Number Bed days The period 
of patients of treatment of patients of treatment 

Therapeutic 2100 33180 15,8 970 16296 16,8 
Surgical 560 5320 9,5 990 9702 9,8 
Gynecologic 580 4060 7,0 1020 7650 7,5 
Total 3240 42560 13,1 2980 33648 11,3 
As we see, average term of treatment in the hospital №2 is much lower in comparison with hospital №1. But the analysis of these parameters in separate branches testifies to an inaccuracy of this conclusion.
In hospital №1 therapeutic patients prevail, and in hospitals №2  gynecologic, which terms of treatment essentially differ.
Standard definition
Branch 
Hospital № 1 
Hospital № 2 
The standard 


Number of patients 
% 
Number of patients 
% 
Number of patients 
% 
Therapeutic 
2100 
64,8 
970 
32,6 
3070 
49,4 
Surgical 
560 
17,3 
990 
33,2 
1550 
24,9 
Gynecologic 
580 
17,9 
1020 
34,2 
1600 
25,7 
Total 
3240 
100,0 
2980 
100,0 
6220 
100,0 
Let's determine average duration of treatment in both hospitals provided that the structure of hospitalized patients would be identical.
Branch 
Standard distribution of sick (%) 
Hospital №1 
Hospital №2 
Standard distribution we multiply for the term of treatment 
Standard distribution we multiply for the term of treatment 

Therapeutic 
49,4 
49,4 × 15,8 : 100 = 7,8 
49,4 × 16,8 : 100 = 8,3 
Surgical 
24,9 
24,9 × 9,5 : 100 = 2,4 
24,9 × 9,8 : 100 = 2,4 
Gynecologic 
25,7 
25,7 × 7,0 : 100 = 1,8 
25,7 × 7,5 : 100 = 1,9 
Total 
100,0 
standard parameter 12,0 
standard parameter 12,6 
Introduction
One of the fundamentals of health situation analysis (HSA) is the comparison of basic health indicators. Among other objectives of HSA, this allows to identify risk areas, define needs, and document inequalities in health, in two or more populations, in subgroups of a population, or else in a single population at different points in time. Crude rates, whether they represent mortality, morbidity or other health events, are summary measures of the experience of populations that facilitate this comparative analysis. However, the comparison of crude rates can sometimes be inadequate, particularly when the population structures are not comparable for factors such as age, sex or socioeconomic level. Indeed, these and other factors influence the magnitude of crude rates and may distort their interpretation in an effect called confounding (box 1).(1 ,2 ,3)

A confounding effect appears when the measurement of the effect of an exposure on a risk is distorted by the relation between the exposure and other factor(s) that also influence(s) the outcome under study.1 Similarly, a confounding factor (or confounder) must meet three criteria: 1) to be a known risk factor for the result of interest,(2) 2) to be a factor associated with exposure but not a result of exposure(2) and 3) to be a factor that is not an intermediate variable between them. An example is that of smoking as a counfounder in the study of coffee consumption as risk factor for ischemic heart disease. The association between coffee consumption and ischemic heart disease may be confounded by smoking. Indeed, smoking is a known risk factor for ischemic heart disease. It is associated with coffee consumption as smokers are usually consumers of coffee, but it is not a result of drinking coffee. Smoking is not an intermediate variable between coffee consumption and ischemic heart disease. Schematically:
Smoking is a confounder of the association between coffee consumption and ischemic heart disease. Sources: (1) Last J. A Dictionnary
of Epidemiology. Fourth Edition. (2) Gordis L.
Epidemiology. Second Edition. 
The calculation of specific rates in well defined subgroups of a population is a way of avoiding certain confounding factors. For example, specific rates calculated by age groups are often used to examine how diseases affect people differently depending on their age. However, although this uncovers the patterns of health events in the population and allows for more rigorous comparison of rates, it can sometimes be impractical to work with a large number of subgroups.(4) Furthermore, if the subgroups consist of small populations, the specific rates can be very imprecise. The process of standardization (or adjustment) of rates is a classic epidemiological method that removes the confounding effect of variables that we know — or think — differ in populations we wish to compare. It provides an easy to use summary measure that can be useful for information users, such as decisionmakers, who prefer to use synthetic health indices in their activities.
In practice, age is
the factor that is most frequently adjusted for. Agestandardization is
particularly used in comparative mortality studies, since the age structure has
an important impact on a population’s overall mortality. For example, in
situations with levels of moderate mortality, as in the majority of the
countries of the
There are two main standardization methods, characterized by whether the standard used is a population distribution (direct method) or a set of specific rates (indirect method). The two methods are presented below.
Direct method
In the direct standardization method, the rate that we would expect to find in the populations under study if they all had the same composition according to the variable which effect we wish to adjust or control (such as age, socioeconomic group, or other characteristics) is calculated. We use the structure of a population called “standard”, stratified according to the control variable, and to which we apply the specific rates of the corresponding strata in the population under study. We thus obtain the number of cases “expected” in each stratum if the populations had the same composition. The adjusted or “standardized” rate is obtained by dividing the total of expected cases by the standard population. An example is presented in
An important step in the direct standardization method is the selection of a standard population. The value of the adjusted rate depends on the standard population used, but to a certain extent this population can be chosen arbitrarily, because there is no significance in the calculated value itself. Indeed adjusted rates are products of a hypothetical calculation and do not represent the exact values of the rates. They serve only for comparisons between groups, not as a measure of absolute magnitude. However, some aspects should be taken into account in the selection of the standard population. The standard population may come from the study population (sum or average for example). In this case however, it is important to ensure that the populations do not differ in size, since a larger population may unduly influence the adjusted rates. The standard population may also be a population without any relation to the data under study, but in general, its distribution with regard to the adjustment factor should not be radically different from the populations we wish to compare.
The comparative
study of adjusted rates may be carried out in different ways: we can calculate
the absolute difference between the rates, their ratio, or the percentage
difference between them. Obviously, this comparison is valid only when the same
standard was used to calculate the adjusted rates. When the national standards
change (as in the
The direct method is most often used. However, it requires rates specific to population strata corresponding to the variable of interest in all the populations we wish to compare, which are sometimes not available. Even when these specific rates are available for all the subgroups, they are sometimes calculated from very small numbers and can be very imprecise. In this case, the indirect standardization method is recommended.
using the direct method, 19951997 

In this example, the standard population that was used is the socalled “old” world standard population defined by Waterhouse (see In this example, to use the direct method we need:  The specific mortality rates by stratum of the
characteristic we want to control, in this case age, in each population (i.e.
 A standard population, stratified in the same way First we calculate the expected number of deaths in both countries, applying the rate of each country to the standard population (columns (4) and (5)). The sum of all the groups gives us the total of expected deaths. To calculate the adjusted rate, we divide this number by the total standard population. 



Agespecific mortality rate per 100,000 population, 19951997 
Expected number of deaths 


Standard population 




<1 
2,400 
1693.2 
737.8 
41 
18 
14 
9,600 
112.5 
38.5 
11 
4 
514 
19,000 
36.2 
21.7 
7 
4 
1524 
17,000 
102.9 
90.3 
17 
15 
2544 
26,000 
209.6 
176.4 
55 
46 
4564 
19,000 
841.1 
702.3 
160 
133 
65+ 
7,000 
4,967.4 
5,062.6 
348 
354 

100,000 


639 
574 
Ageadjusted mortality rate ( When eliminating the effect of the
difference in the age structure in both countries, we obtain a rate that is
higher in


Source of the data: Pan American Health Organization. Perfiles de mortalidad de las comunidades hermanas fronterizas México  Estados Unidos Edición 2000 / Mortality profiles of the Sister Communities on the United StatesMexico border 2000 Edition. Washington, D.C.: OPS. 2000 


Age groups (years) 
World 
European 
0 
2,400 
1,600 
14 
9,600 
6,400 
59 
10,000 
7,000 
1014 
9,000 
7,000 
1519 
9,000 
7,000 
2024 
8,000 
7,000 
2529 
8,000 
7,000 
3034 
6,000 
7,000 
3539 
6,000 
7,000 
4044 
6,000 
7,000 
4549 
6,000 
7,000 
5054 
5,000 
7,000 
5559 
4,000 
6,000 
6064 
4,000 
5,000 
6569 
3,000 
4,000 
7074 
2,000 
3,000 
7579 
1,000 
2,000 
8084 
500 
1,000 
85+ 
500 
1,000 
Total 
100,000 
100,000 
Source:
Waterhouse J. y 
Indirect method
Indirect standardization is different in both method and interpretation. An example of adjustment using the indirect method is presented in
Standardized Mortality Ratios are frequently used in epidemiology to compare different study groups, because they are easy to calculate and also because they provide an estimate of the relative risk between the standard population and the population under study. However, it is important to know that there are instances when this comparison is not adequate, like for example when the ratios of the rates in the groups under study and in the population of reference are not homogeneous in the different strata. However, the comparison between each group and the population of reference is always relevant. The SMRs of different causes in a population may also be calculated using a single standard.
and mortality in 

The crude mortality rate in Colombia in 1999 was 4.4
per 1,000 population, with variations between 1.8 per 1,000 population in the
department of Vichada and 6.9 per In this case, in order to use the indirect method we need: The
agespecific mortality rates by age group in The population of the state of Vichada stratified by age The total number of deaths observed in the department of Vichada The first step is to calculate the expected number of deaths in Vichada by applying the standard rates to the population of the department (column (3) = (1) x (2)). Then the calculated deaths are summed up and the SMR is calculated by dividing the total number of observed deaths by the expected deaths. 


Tasas de mortalidad específica por grupos de edad, Colombia, 1999 (i) (1) 
Población (2) 
Muertes observadas en Vichada (3) 
Muertes esperadas en Vichada, 1999 (i) 
04 
339 
11,392 
61 
39 
514 
34 
21,930 
5 
7 
1544 
219 
38,244 
27 
84 
4564 
752 
7,083 
22 
53 
65 + 
4.573 
1,839 
27 
84 


80,488 
142 
267 
The SMR of 53% indicates that in the population
of Vichada the risk of dying is 47% less than expected according to the
mortality standards of all of 

Fuente de los datos: (i) Situación
de Salud en 
NOTE: Confidence interval for SMRs The confidence interval provides the range of values within which we expect to find the real value of the indicator under study, with a given probability. That way, it gives an estimate of the potential difference between what is observed and what is really happening in the population, which helps in interpreting the value of the observed indicator. The 95% confidence interval is the most used. As mentioned previously, it indicates the range of values within which we expect to find the real value of the indicator, with a probability of 95%. In the case of the SMR, the calculation of the confidence interval can be carried out in the following way: 1) First, the Standard Error (SE) for the SMR is calculated using the following formula:
2) The 95% Confidence Interval (CI) is calculated as follows: where 1.96 is the value of the Z distribution with a level of confidence of 95%. It is assumed that the values follow a normal distribution. 
In this example: SE(Vichada) = 4.4 and CI(Vichada) (95%) = [44.4 ; 61.6] The confidence interval indicates that we know with a probability of 95% that the SMR’s value is between 44.4 and 61.6. 
Conclusion
As with any summary measure, adjusted rates may hide great differences between groups, which can be of importance to explain changes in the rates due to or associated with the variable that we wish to adjust for, for example. Nevertheless, whenever possible it is important to analyze the specific rates along with the adjusted rates. The two methods used in a single population should lead to the same conclusions. If it were not the case, the situation in the different population strata requires more indepth research.
One of the reasons for sometimes limited use of these methods is the lack of tools or instruments that simplify it. To respond to this need, the General Direction of Public Health of the Xunta de Galicia and PAHO’s Special Program for Health Analysis have developed the “EpiDat” computer package for analysis of tabulated data. EpiDat is distributed free of charge via the Internet at: http://www.paho.org/Spanish/SHA/epidat.htm. A newer version of this package will be issued soon. The software SIGEpi (see http://www.paho.org/English/sha/be_v22n3SIGEpi.htm), which combines the capacity of a geographic information system with epidemiological tools, also allows to generate adjusted rates.
In short, adjusted rates allow for more exact comparisons between populations. This is important because it can be used in setting priorities between groups. Nevertheless, the crude rates are the only indicators of the real dimension or magnitude of a problem and hence remain valuable public health tools.
Standardization Methods
The following table lists standardization methods and their corresponding location and scale measures available with the METHOD= option.
Table 59.2: Available Standardization Methods
Method 
Location 
Scale 
MEAN 
mean 
1 
MEDIAN 
median 
1 
SUM 
0 
sum 
EUCLEN 
0 
Euclidean length 
USTD 
0 
standard deviation about origin 
STD 
mean 
standard deviation 
RANGE 
minimum 
range 
MIDRANGE 
midrange 
range/2 
MAXABS 
0 
maximum absolute value 
IQR 
median 
interquartile range 
MAD 
median 
median absolute deviation from median 
ABW(c) 
biweight 1step Mestimate 
biweight Aestimate 
AHUBER(c) 
Huber 1step Mestimate 
Huber Aestimate 
AWAVE(c) 
Wave 1step Mestimate 
Wave Aestimate 
AGK(p) 
mean 
AGK estimate (ACECLUS) 
SPACING(p) 
mid minimumspacing 
minimum spacing 
L(p) 
L(p) 
L(p) 
IN(ds) 
read from data set 
read from data set 
For METHOD=ABW(c), METHOD=AHUBER(c), or METHOD=AWAVE(c), c is a positive numeric tuning constant.
For METHOD=AGK(p), p is a numeric constant giving the proportion of pairs to be included in the estimation of the withincluster variances.
For METHOD=SPACING(p), p is a numeric constant giving the proportion of data to be contained in the spacing.
For METHOD=L(p), p is a numeric constant greater than or equal to 1 specifying the power to which differences are to be raised in computing an L(p) or Minkowski metric.
For METHOD=IN(ds), ds is the name of a SAS data set that meets either one of the following two conditions:
· contains a _TYPE_ variable. The observation that contains the location measure corresponds to the value _TYPE_= 'LOCATION' and the observation that contains the scale measure corresponds to the value _TYPE_= 'SCALE'. You can also use a data set created by the OUTSTAT= option from another PROC STDIZE statement as the ds data set. See the section "Output Data Sets" for the contents of the OUTSTAT= data set.
· contains the location and scale variables specified by the LOCATION and SCALE statements.
PROC STDIZE reads in the location and scale variables in the ds data set by first looking for the _TYPE_ variable in the ds data set. If it finds this variable, PROC STDIZE continues to search for all variables specified in the VAR statement. If it does not find the _TYPE_ variable, PROC STDIZE searches for the location variables specified in the LOCATION statement and the scale variables specified in the SCALE statement.
For robust estimators, refer to Goodall (1983) and Iglewicz (1983). The MAD method has the highest breakdown point (50%), but it is somewhat inefficient. The ABW, AHUBER, and AWAVE methods provide a good compromise between breakdown and efficiency. The L(p) location estimates are increasingly robust as p drops from 2 (corresponding to least squares, or mean estimation) to 1 (corresponding to least absolute value, or median estimation). However, the L(p) scale estimates are not robust.
The SPACING method is robust to both outliers and clustering (Jannsen et al. 1995) and is, therefore, a good choice for cluster analysis or nonparametric density estimation. The midminimum spacing method estimates the mode for small p. The AGK method is also robust to clustering and more efficient than the SPACING method, but it is not as robust to outliers and takes longer to compute. If you expect g clusters, the argument to METHOD=SPACING or METHOD=AGK should be [1/g] or less. The AGK method is less biased than the SPACING method for small samples. As a general guide, it is reasonable to use AGK for samples of size 100 or less and SPACING for samples of size 1000 or more, with the treatment of intermediate sample sizes depending on the available computer resources.
Since epidemiology is concerned with the distribution of disease in populations, summary measures are required to describe the amount of disease in a population. There are two basic measures, incidence and prevalence.
Incidence is a measure of the rate at which new cases of disease occur in a population previously without disease. Thus, the incidence, denoted by I, is defined as
The period of time is specified in the units in which the rate is expressed. Often the rate is multiplied by a base such as 1000 or 1000 000 to avoid small decimal fractions. For example, there were 280 new cases of cancer of the pancreas in men in New South Wales in 1997 out of a population of 3115 million males. The incidence was 280/3115 = 90 per million per year.
Prevalence, denoted by P, is a measure of the frequency of existing disease at a given time, and is defined as
Both incidence and prevalence usually depend on age, and possibly sex, and sex and agespecific figures would be calculated.
The prevalence and incidence rates are related, since an incident case is, immediately on occurrence, a prevalent case and remains as such until recovery or death (disregarding emigration and immigration). Provided the situation is stable, the link between the two measures is given by
P = It, (19.4)
where t is the average duration of disease. For a chronic disease from which there is no recovery, t would be the average survival after occurrence of the disease.
Problems due to confounding arise frequently in vital statistics and have given rise to a group of methods called standardization. We shall describe briefly one or two of the most wellknown methods.
Mortality in a population is usually measured by an annual death rate — for example, the number of individuals dying during a certain calendar year divided by the estimated population size midway through the year. Frequently this ratio is multiplied by a convenient base, such as 1000, to avoid small decimal fractions; it is then called the annual death rate per 1000 population. If the death rate is calculated for a population covering a wide age range, it is called a crude death rate.
In a comparison of the mortality of two populations, say, those of two different countries, the crude rates may be misleading. Mortality depends strongly on age. If the two countries have different age structures, this contrast alone may explain a difference in crude rates (just as, in Table 15.6, the contrast between the ‘crude’ proportions with factor A was strongly affected by the different sex distributions in the disease and control groups). An example is given in Table 19.1 which shows the numbers of individuals and numbers of deaths separately in different age groups, for two countries: A, typical of highly industrialized countries, with a rather high proportion of individuals at the older ages; and B, a developing country with a small proportion of old people. The death rates at each age (which are called agespecific death rates) are substantially higher for B than for A, and yet the crude death rate is higher for A than for B.
Sometimes, however, mortality has to be compared for a large number of different populations, and some form of adjustment for age differences is required. For example, the mortality in one country may have to be compared over several different years; different regions of the same country may be under study; or one may wish to compare the mortality for a large number of different occupations. Two obvious generalizations are: (i) in standardizing for factors other than, or in addition to, age—for example, sex, as in Table 15.6; and (ii) in morbidity studies where the criterion studied is the occurrence of a certain illness rather than of death. We shall discuss the usual situation—the standardization of mortality rates for age.
The basic idea in standardization is that we introduce a standard population with a fixed age structure. The mortality for any special population is then adjusted to allow for discrepancies in age structure between the standard and special populations. There are two main approaches: direct and indirect methods of standardization. The following brief account may be supplemented by reference to Liddell (1960), Kalton (1968) or Hill and Hill (1991).
The following notation will be used.
In the direct method the death rate is standardized to the age structure of the standard population. The directly standardized death rate for the special population is, therefore,
It is obtained by applying the special death rates, p_{i}, to the standard population sizes, N,. Alternatively, p' can be regarded as a weighted mean of the p,, using the N, as weights. The variance of p' may be estimated as
where q, = 1 — pt; if, as is often the case, the pi are all small, the binomial variance of pi, piq,/ni, maybe replaced by the Poisson term pi/ni (= ri/n^^{2}), giving
To compare two special populations, A and B, we could calculate a standardized rate for each (p'_{A} and p'_{B}), and consider
From (19.5),
which has exactly the same form as (15.15), with Wi = Ni, and di = p_{A}i — p_{B}i as in (15.14). The method differs from that of Cochran’s test only in using a different system of weights. The variance is given by
with var(d,) given by (15.17). Again, when the p_{0}i are small, qo, can be put approximately equal to 1 in (15.17).
If it is required to compare two special populations using the ratio of the standardized rates, p'a/p'b, then the variance of the ratio may be obtained using (19.6) and (5.12).
The variance given by (19.7) may be unsatisfactory for the construction of confidence limits if the numbers of deaths in the separate age groups are small, since the normal approximation is then unsatisfactory and the Poisson limits are asymmetric. The standardized rate (19.5) is a weighted sum of the Poisson counts, ri. Dobson et al. (1991) gave a method of calculating an approximate confidence interval based on the confidence interval of the total number of deaths.
Indirect method
This method is more conveniently thought of as a comparison of observed and expected deaths than in terms of standardized rates. In the special population the total number of deaths observed is ∑ri. The number of deaths expected if the agespecific death rates were the same as in the standard population is∑ niPi. The overall mortality experience of the special population may be expressed in terms of that of the standard population by the ratio of observed to expected deaths:
When multiplied by 100 and expressed as a percentage, (19.9) is known as the standardized mortality ratio (SMR).
To obtain the variance of M we can use the result var(ri) — nipiqi, and regard the P_{i} as constants without any sampling fluctuation (since we shall often want to compare one SMR with another using the same standard population; in any case the standard population will often be much larger than the special population, and var(Pi) will be much smaller than var(pi)). This gives
The smallness of the standard error (SE) of the SMR in Example 19.3 is typical of much vital statistical data, and is the reason why sampling errors are often ignored in this type of work. Indeed, there are problems in the interpretation of occupational mortality statistics which often overshadow sampling errors. For example, occupations may be less reliably stated in censuses than in the registration of deaths, and this may lead to biases in the estimated death rates for certain occupations. Even if the data are wholly reliable, it is not clear whether a particularly high or low SMR for a certain occupation reflects a health risk in that occupation or a tendency for selective groups of people to enter it. In Example 19.3, for example, the SMR for farmers may be low because farming is healthy, or because unhealthy people are unlikely to enter farming or are more likely to leave it. Note also that in the lowest age group there is an excess of deaths among farmers (87 observed, 55 expected). Any method of standardization carries the risk of oversimplification, and the investigator should always compare agespecific rates to see whether the contrasts between populations vary greatly with age.
The method of indirect standardization is very similar to that described as the comparison of observed and expected frequencies on p. 520. Indeed if, in the comparison of two groups, A and B, the standard population were defined as the pooled population A + B, the method would be precisely the same as that used in the CochranMantelHaenszel method (p. 520). We have seen (p. 662) that Cochran’s test is equivalent to a comparison of two direct standardized rates. There is thus a very close relationship between the direct and indirect methods when the standard population is chosen to be the sum of the two special populations.
The SMR is a weighted mean, over the separate age groups, of the ratios of the observed death rates in the special population to those in the standard population, with weights (niPi) that depend on the age distribution of the special population. This means that SMRs calculated for several special populations are not strictly comparable (Yule, 1934), since they have been calculated with different weights. The SMRs will be comparable under the hypothesis that the ratio of the death rates in the special and standard populations is independent of age — that is, in a proportionalhazards situation.
The relationship between standardization and generalized linear models is discussed by Breslow and Day (1975), Little and Pullum (1979) and Freeman and Holford (1980).
Surveys to investigate associations
A question commonly asked in epidemiological investigations into the aetiology of disease is whether some manifestation of ill health is associated with certain personal characteristics or habits, with particular aspects of the environment in which a person has lived or worked, or with certain experiences which a person has undergone. Examples of such questions are the following.
1. Is the risk of death from lung cancer related to the degree of cigarette smoking, whether current or in previous years?
2. Is the risk that a child dies from acute leukaemia related to whether or not the mother experienced irradiation during pregnancy?
3. Is the risk of incurring a certain illness increased for individuals who were treated with a particular drug during a previous illness?
Sometimes questions like these can be answered by controlled experimentation in which the presumptive personal factor can be administered or withheld at the investigator’s discretion; in example 3, for instance, it might be possible for the investigator to give the drug in question to some patients and not to others and to compare the outcomes. In such cases the questions are concerned with causative effects: ‘Is this drug a partial cause of this illness?’ Most often, however, the experimental approach is out of the question. The investigator must then be satisfied to observe whether there is an association between factor and disease, and to take the risk which was emphasized in §7.1 if he or she wishes to infer a causative link.
These questions, then, will usually be studied by surveys rather than by experiments. The precise population to be surveyed is not usually of primary interest here. One reason is that in epidemiological surveys it is usually administratively impossible to study a national or regional population, even on a sample basis. The investigator may, however, have facilities to study a particular occupational group or a population geographically related to a particular medical centre. Secondly, although the mean values or relative frequencies of the different variables may vary somewhat from one population to another, the magnitude and direction of the associations between variables are unlikely to vary greatly between, say, different occupational groups or different geographical populations.
There are two main designs for aetiological surveys — the casecontrol study, sometimes known as a casereferent study, and the cohort study. In a case control study a group of individuals affected by the disease in question is compared with a control group of unaffected individuals. Information is obtained, usually in a retrospective way, about the frequency in each group of the various environmental or personal factors which might be associated with the disease. This type of survey is convenient in the study of rare conditions which would appear too seldom in a random population sample. By starting with a group of affected individuals one is effectively taking a much higher sampling fraction of the cases than of the controls. The method is appropriate also when the classification by disease is simple (particularly for a dichotomous classification into the presence or absence of a specific condition), but in which many possible aetiological factors have to be studied. A further advantage of the method is that, by means of the retrospective enquiry, the relevant information can be obtained comparatively quickly.
In a cohort study a population of individuals, selected usually by geographical or occupational criteria rather than on medical grounds, is studied either by complete enumeration or by a representative sample. The population is classified by the factor or factors of interest and followed prospectively in time so that the rates of occurrence of various manifestations of disease can be observed and related to the classifications by aetiological factors. The prospective nature of the cohort study means that it will normally extend longer in time than the case control study and is likely to be administratively more complex. The corresponding advantages are that many medical conditions can be studied simultaneously and that direct information is obtained about the health of each subject through an interval of time.
Casecontrol and cohort studies are often called, respectively, retrospective and prospective studies. These latter terms are usually appropriate, but the nomenclature may occasionally be misleading since a cohort study may be based entirely on retrospective records. For example, if medical records are available of workers in a certain factory for the past 30 years, a cohort study may relate to workers employed 30 years ago and be based on records of their health in the succeeding 30 years. Such a study is sometimes called a historical prospective study.
A central problem in a casecontrol study is the method by which the controls are chosen. Ideally, they should be on average similar to the cases in all respects except in the medical condition under study and in associated aetiological factors. Cases will often be selected from one or more hospitals and will then share the characteristics of the population using those hospitals, such as social and environmental conditions or ethnic features. It will usually be desirable to select the control group from the same area or areas, perhaps even from the same hospitals, but suffering from quite different illnesses unlikely to share the same aetiological factors. Further, the frequencies with which various factors are found will usually vary with age and sex. Comparisons between the case and control groups must, therefore, take account of any differences there may be in the age and sex distributions of the two groups. Such adjustments are commonly avoided by arranging that each affected individual is paired with a control individual who is deliberately chosen to be of the same age and sex and to share any other demographic features which may be thought to be similarly relevant.
The remarks made in §19.2 about nonsampling errors, particularly those about nonresponse, are also relevant in aetiological surveys. Nonresponses are always a potential danger and every attempt should be made to reduce them to as low a proportion as possible.
This paper by Doll and Hill is an excellent illustration of the care which should be taken to avoid bias due to unsuspected differences between case and control groups or to different standards of data recording. This study, and many others like it, strongly suggest an association between smoking and the risk of incurring lung cancer. In such retrospective studies, however, there is room for argument about the propriety of a particular choice of control group, little information is obtained about the time relationships involved, and nothing is known about the association between smoking and diseases other than those selected for study. Doll and Hill (1954, 1956, 1964) carried out a cohort study prospectively by sending questionnaires to all the 59 600 doctors in the UK in October 1951. Adequate replies were received from 682% of the population (34439 men and 6194 women). The male doctors were followed for 40 years and notifications of deaths from various causes were obtained, only 148 being untraced (Doll et al., 1994). Some results are shown in Table 19.4. The groups defined by different smoking categories have different age distributions, and the death rates shown in the table have again been standardized for age (§19.3). Cigarette smoking is again shown to be associated with a sharp increase in the death rate from lung cancer, there is almost as strong an association for chronic obstructive lung disease, and a relatively weak association with the death rates from ischaemic heart disease.
This prospective study provides strong evidence that the association between smoking and lung cancer is causative. In addition to the data in Table 19.4, many doctors who smoked at the outset of the study stopped smoking during the followup period, and by 1971 doctors were smoking less than half as much as people of the same ages in the general population (Doll & Peto, 1976). This reduction in smoking was matched by a steady decline in the death rate from lung cancer for the whole group of male doctors (age standardized, and expressed as a fraction of the national mortality rate) over the first 20 years of followup.
In a cohort study in which the incidence of a specific disease is of particular interest, the casecontrol approach may be adopted by analysing the data of all the cases and a control group of randomly selected noncases. This approach was termed a synthetic retrospective study by Mantel (1973). Often the controls are chosen matched for each case by random sampling from the members of the cohort who are noncases at the time that the case developed the disease (Liddell et al., 1977), and this is usually referred to as a nested casecontrol study. Care may be needed to avoid the repeated selection of the same individuals as controls for more than one case (Robins et al., 1989). A related design is the casecohort study, which consists of a random sample of the whole cohort; some members of this sample will become cases and they are supplemented by the cases that occur in the remainder of the cohort, with the noncases in the random subcohort serving as controls for the total set of cases (Kupper et al., 1975; Prentice, 1986). These designs are useful in situations where it is expensive to extract the whole data, or when expensive tests are required; if material, such as blood samples, can be stored and then analysed for only a fraction of the cohort, then there may be a large saving in resources with very little loss of efficiency.
The measurement of the degree of association between the risk of disease and the presence of an aetiological factor is discussed in detail in the next section.
Subjectyears method
A commonly used research method is the cohort study, in which a group is classified by exposure to some substance, followed over time and the vital status of each member determined up to the time at which the analysis is being conducted. A review of methods of cohort study design and application was given by Liddell (1988). It may be possible to use existing records to determine exposure in the past, and this gives the historical prospective cohort study, used particularly in occupational health research. Such studies often cover periods of over 20 years. The aim is to compare the mortality experience of subgroups, such as high exposure with low exposure, in order to establish whether exposure to the agent might be contributing to mortality. As such studies cover a long period of time, individuals will be ageing and their mortality risk will be changing. In addition, there may be period effects on mortality rate. Both the age and period effects will need to be taken account of. One approach is the subjectyears or personyears method, sometimes referred to as the modified lifetable approach; an early use of this method was by Doll (1952). In this approach the number of deaths in the group, or in the subgroups, is expressed in terms of the number of deaths expected if the individuals had experienced the same death rates as the population of which the group is a part.
The expected mortality is calculated using published national or regional death rates. The age of each subject, both at entry to the study and as it changes through the period of followup, has to be taken into account. Also, since age specific death rates depend on the period of time at which the risk occurs, the cohort of each subject must be considered. Official death rates are usually published in 5year intervals of age and period and may be arranged as a rectangular array consisting of cells, such as the age group 4549 during the period 197680. Each subject passes through several of these cells during the period of followup and experiences a risk of dying according to the years of risk in each cell and the death rate. This risk is accumulated so long as a subject is at risk of dying in the study—that is, until the date of death or until the end of the followup period for the survivors. This accumulated risk is the same as the cumulative hazard. The expected number of deaths is obtained by adding over all subjects in the group, and it is computationally convenient to add the years at risk in each cell over subjects before multiplying by the death rates. This is the origin of the name of the method, since subjectyears at risk are calculated.
The method can be applied for total deaths and also for deaths from specific causes. The same table of subjectyears at risk is used for each cause with different tables of death rates and, when a particular cause of death is being considered, deaths from any other cause are effectively treated as censored survivals. For any cause of death, the observed number is treated as a Poisson variable with expectation equal to the expected number. The method is similar to indirect standardization of death rates and the ratio of observed to expected deaths is often referred to as the standardized mortality ratio (SMR).
The method has usually been applied to compare observed and expected mortality within single groups but may be extended to compare the SMR between different subgroups or, more generally, to take account of covariates recorded for each individual, by expressing the SMR as a proportionalhazards regression model (Berry, 1983)—that is, by the use of a generalized linear model. For subgroup i, if mi is the cumulative hazard from the reference population and the proportionalhazards multiplier, then , the expected number of deaths, is given by
If is modelled in terms of a set of covariates by
then
This model is similar to the generalized linear model of a Poisson variable but contains the additional term ln mi, sometimes referred to as the offset, which ensures that age and period are both adjusted for. An example, in which the covariates are represented by a threefactor structure, is given in Berry (1983).
A disadvantage of the above approach is that it involves the assumption of proportional hazards between the study population and the external reference population across all the ageperiod strata, that is, that the reference death rates apply to the study population at least to a constant of proportionality. The simplest alternative approach that does not depend on this assumption is to work entirely within the data set without any reference to an external population. The death rates are calculated for each ageperiod cell from the data for the whole cohort. These internal rates are then used to calculate the expected numbers of deaths for subgroups defined in terms of the covariates, and so SMRs are produced for each of the subgroups based on the observed death rates of the whole population in each cell, instead of on external death rates (Breslow & Day, 1987, §3.5). If there are more than two subgroups, comparison of these internal SMRs still depends on proportional hazards of the subgroups across the age period strata, but a proportionalhazards assumption between the study population and a reference population is no longer necessary. This approach is known to be conservative and may be improved by a MantelHaenszel approach (Breslow, 1984b; Breslow & Day, 1987, §3.6). This method is referred to as internal standardization.
A second approach is to use direct standardization with an internal subgroup as standard. An appropriate choice for this internal reference may be an unexposed group or the least exposed group in the study. This method produces standardized rate ratios (SRRs), which are ratios of the directly standardized rate for each subgroup to the rate in the standard reference subgroup. The method avoids the possible problems associated with comparing SMRs from more than two groups, but the SRRs may be less precisely estimated than SMRs if the subgroups contain cells with few deaths. For a fuller discussion and an example analysed both by the external SMR method and the internal SRR method, see Checkoway et al. (1989, Chapter 5).
The most comprehensive approach is that of Poisson modelling. For subgroup i and ageperiod stratum j, if nij is the number of subjectyears and y_{t}j the death rate then, , the expected number of deaths is given by
If is modelled in terms of the ageperiod stratum and a set of covariates by
then
This is a generalized linear model of a Poisson variable (14.12), with the additional term ln n_{ij} (see (14.13)). An application is given as Example 14.4. Breslow and Day (1987, Chapter 4) give fuller details and worked examples.
In many cases the external method (19.37) and the internal method (19.38) will give similar inferences on the effect of the covariates. The latter has the advantage that there is no assumption that the death rates in all the ageperiod strata follow a proportionalhazards model with respect to the reference population, but the precision of the comparisons is slightly inferior to the external method. A combination of the two approaches, using (19.37) for the overall group and main subgroups and (19.38) for regression modelling, has the advantage of estimating the effects of the covariates and also estimating how the mortality in the overall group compares with that in the population of which it is a part. An example where this was done is given by Checkoway et al. (1993).
Choice of Standard Population
Standardized measures describe a hypothetical state of affairs, which is a function of the standard population chosen. For direct agestandardization, the total U.S. population from the previous census is especially common. Since rates standardized to the same external standard are comparable, the selection of a commonly used standard has advantages when comparing rates across different studies. Sometimes investigators compute directly standardized rates based upon one of their own study populations as the standard or by combining two or more study populations to create a standard. But rates standardized to a specific study population are not as readily compared to rates from other studies.
When a study involves a comparison with a "control" population, the choice of a standard should reflect the study goals. For example, an examination of county mortality variation within a state might compare county mortality to the state as a whole. A clean industry may be a good standard for an industrial population exposed to suspected occupational health hazards. Since indirectly standardized measures require knowledge of stratumspecific rates in the standard, data availability constrains the choice.
The choice of a standard population is not always obvious, and there may not be a "best" choice. For example, in comparing syphilis rates across counties in North Carolina, Thomas et al. (1995) decided to standardize the rates by age and sex to reduce the influence of different agesex distributions in different counties. One obvious choice for a set of weights was the agesex distribution of North Carolina as a whole. However, another possible choice was to use the agesex distribution for the U.S. as a whole, so that other investigators could more readily compare syphilis rates in their states to the rates presented in the article. Was there a "right" answer? In this case the choice between the two standards could be regarded as a choice between greater "relevance" and broader comparability. The net result makes little difference, however, since the agesex distribution of North Carolina and the entire U.S. are very similar. In other situations, however, the choice of standards can indeed change the message conveyed by the results.
Just as the growth of knowledge leads to revisions to disease classification systems, thereby complicating comparisons across revisions, changes in the age distribution over decades creates the dilemma of switch to a new standard population to reflect the present reality versus retaining the existing standard to preserve comparability across time. For this reason mortality rates in the United States have been standardized to the 1940 population distribution almost to the end of the 20th century. Other standards (1970, 1980) were also in use, however, complicating comparisons of mortality statistics. During the 1990's, the U.S. National Center for Health Statistics (NCHS/CDC) coordinated an effort among federal and state agencies to adopt the year 2000 projected U.S. population for standardization of mortality statistics. In August 1998 all U.S. Department of Health and Human Services (DHHS) agencies were directed to use the 2000 Standard Population for age adjusting mortality rates beginning no later than data year 1999 (Schoenborn et al., 2000).
Since the age distribution in 2000 is shifted to the right (older ages) compared to the 1940 population, mortality rates standardized to the 2000 population will be higher than if they were standardized to the 1940 census because they will assign more weight to older age strata, where mortality rates are high. In the same way, comparisons (e.g., ratios) of standardized rates will reflect the situation among older age groups more than in the past. To be sure, the switch will make comparisons to past data problematic, though NCHS will recompute agestandardized mortality rates for past years based on the 2000 population standard.
The opposite result will occur when at some point it is decided that in a global society all countries should standardized their rates to the World population, to facilitate comparison across countries. Since the large majority of the world's population live in developing countries and is much younger than the population of the U.S. and other developed countries, standardization using a world standard will yield lower standardized rates for most causes of death. As illustrated by the fruit stand example in the beginning of this chapter, different standards can give different, but correct, results. Comparisons, the usual goal of examining rates, may be less affected then the rates themselves, as long as the patterns (e.g., rise in mortality rate with age) are the same in the populations being compared. When that is not the case, then the question of whether it is meaningful to compare summary measures at all becomes more important than the question of which weights to use.
Key concepts
Populations are heterogeneous – they contain disparate subgroups. So any overall measure is a summary of values for constituent subgroups. The underlying reality is the set of rates for (ideally homogenous) subgroups.
The observed ("crude") rate is in fact a weighted average of subgroup"specific" rates, weighted by the size of the subgroups.
Comparability of weighted averages depends on similarity of weights.
"Standardized" (and other kinds of adjusted) measures are also weighted averages, with weights chosen to improve comparability.
Crude rates are "real", standardized rates are hypothetical.
The "direct" method (weights taken from an external standard population) gives greater comparability but requires more data.
The "indirect" method (weights taken from the internal study population) requires fewer data but provides less comparability.
Choice of weights can affect both rates, comparisons of rates, and comparability to other populations, so the implications of using different possible standard populations should be considered.
Any summary conceals information; if there is substantial heterogeneity, the usefulness of a summary is open to question.
References:
1. David Machin. Medical statistics: a textbook for the health sciences / David Machin, Michael J. Campbell, Stephen J Walters. – John Wiley & Sons, Ltd., 2007. – 346 p.
2. Nathan Tintle. Introduction to statistical investigations / Nathan Tintle, Beth Chance, George Cobb, Allan Rossman, Soma Roy, Todd Swanson, Jill VanderStoep. – UCSD BIEB100, Winter 2013. – 540 p.
3. Armitage P. Statistical Methods in Medical Research / P. Armitage, G. Berry, J. Matthews. – Blaskwell Science, 2002. – 826 p.
4. Larry Winner. Introduction to Biostatistics / Larry Winner. – Department of Statistics University of Florida, July 8, 2004. – 204 p.
5. Weiss N. A. (Neil A.) Elementary statistics / Neil A. Weiss; biographies by Carol A. Weiss. – 8th ed., 2012. – 774 p.