STANDARDIZATION OF INDICES.
DIRECT
METHOD OF STANDARDIZATION.
Standardization
Comparison
of indices in totalities, which differs by their
structure, needs their standardization that means correction on condition that
the structure of totalities will be taken to the unique
standard.
The following quantities are used in medical
statistics:
·
absolute –
the absolute quantities of the phenomenon, environment are represented
·
average – the variant type of
characteristic distributing are represented
·
relative – the alternative type of
characteristic distributing are represented
Intensive index – shows the level, expansion (spread) of the phenomenon; it is used for
the comparison of two and more statistical totalities, which are in different
in amount.
²² = 
the absolute quantities of the phenomenon 
× 1000 
the absolute quantities of the environment 
Example:
environment – 11 students (statistical totality)
phenomenon morbidity: a caries – 5 students
a goitre is 3 students
gastritis is 4
students
²² = _{}‰
environment – 40 schoolboys
morbidity – 15
schoolboys ²² =_{}‰
The method of standardization is used in the
case, when the environments are heterogeneous (by age, sex...)
Standardization
is the method of calculation of conditional
(standardized) indices.
The
essence of
standardization method of indices consists
in the calculation of conditional (standardized) indices, which substitute the
intensive or other quantities in those cases, when comparison of these indices
is complicated through the impossibility of comparison of groups structure.
The
standardized indices are conditional, because they
indicate, what these indices were, if the influence of this factor that
interferes their comparison, was absent accessory removing the influence of
this or that factor on the veritable (real) indices. The standardized
indices can be used only with the purpose of comparison, because they don’t
give imagination about the real sizes of the phenomenon.
There are different methods of calculation of the
standardized indices. The most widespread method is the direct one.
The direct method of
standardization is used at:
à) considerable divergences of levels of group
indices (for example, different levels of lethality in hospitals or
departments, different levels of morbidity for men and women, and others);
á) considerable
heterogeneity of totalities, which are compared.
The standardized indices show, what were the veritable indices, if the influence of some
certain factor was not present. They allow to level
any influence on the indices.
Name the stages of direct method of
standardization.
I st
stage is
the calculation of general intensive indices (or averages) for the pair of
totalities, which are compared;
²²nd stage – choice and calculation of
standard. As a standard they most frequently use the
halfsum of two groups (totalities), which are compared;
²²²rd stage
is the calculation of „expected quantities” in every group of standard;
The forth ²V stage is the determination
of the standardized indices;
The fifth V stage
is the comparison of groups according to the intensive and standardized
indices.
Conclusions.
In
the conclusions it must be noted that the standardized index  is the
conditional index, which answer only the question – what was the level of the
phenomenon that is studied, if the conditions of its origin were standard.
The ordinary intensive indices, characterizes the level, frequency of the
phenomenon, because they are true and may change depending on the size of the
taken standard.

Methods of Standardization 









Direct 

Indirect 

Reverse 
The stages of direct method of standardization:
1. Calculation of general intensive (or average) indices in compared groups.
2. Choice and calculation of the standard.
3. Calculation of “expected” figures in every group of the standard.
4. Determination of standardized
indices.
5. Comparation of simple intensive and
standardized indices. Conclusions.
Usage
of standardized indices:
1.
Comparative evaluation of demographic indices in different
age and social groups.
2.
Comparative analysis of morbidity in different age and social
groups.
3.
Comparative evaluation of treatment quality in hospitals with
different content of patients in departments.
The
types of values which exist in science:
·
absolute – the absolute size of the phenomenon, environment
are represented
·
average – the
variant type of signs distribution are represented
·
relative – the alternative type of signs distribution
are represented
Table
2.14 Example. Average duration of treatment in the
hospitals
Department Hospital ¹1 Hospital ¹2 

Number Bed days The period Number
Bed days The period 
of
patients of
treatment of patients of treatment 

Therapeutic 2100 33180 15,8 970 16296 16,8 
Surgical 560 5320 9,5 990 9702 9,8 
Gynecologic 580 4060 7,0 1020 7650 7,5 
Total 3240 42560 13,1 2980 33648 11,3 
As we see,
average term of treatment in the hospital ¹2 is much lower in comparison with
hospital ¹1. But the analysis of these parameters in separate branches
testifies to an inaccuracy of this conclusion.
In hospital ¹1
therapeutic patients prevail, and in hospitals ¹2  gynecologic, which terms of
treatment essentially differ.
Branch 
Hospital ¹ 1 
Hospital ¹ 2 
The standard 


Number of patients 
% 
Number of patients 
% 
Number of patients 
% 
Therapeutic 
2100 
64,8 
970 
32,6 
3070 
49,4 
Surgical 
560 
17,3 
990 
33,2 
1550 
24,9 
Gynecologic 
580 
17,9 
1020 
34,2 
1600 
25,7 
Total 
3240 
100,0 
2980 
100,0 
6220 
100,0 
Let's determine
average duration of treatment in both hospitals provided that the structure of
hospitalized patients would be identical.
Branch

Standard distribution of
sick (%) 
Hospital ¹1 
Hospital ¹2 
Standard distribution we
multiply for the term of treatment 
Standard distribution we
multiply for the term of treatment 

Therapeutic 
49,4 
49,4 × 15,8 : 100
= 7,8 
49,4 × 16,8 : 100
= 8,3 
Surgical 
24,9 
24,9 × 9,5 : 100 =
2,4 
24,9 × 9,8 : 100 =
2,4 
Gynecologic 
25,7 
25,7 × 7,0 : 100 =
1,8 
25,7 × 7,5 : 100 =
1,9 
Total 
100,0 
standard parameter 12,0 
standard parameter 12,6 
Introduction
One of the
fundamentals of health situation analysis (HSA) is the comparison of basic
health indicators. Among other objectives of HSA, this allows to identify risk
areas, define needs, and document inequalities in health, in two or more
populations, in subgroups of a population, or else in a single population at
different points in time. Crude rates, whether they represent mortality,
morbidity or other health events, are summary measures of the experience of
populations that facilitate this comparative analysis. However, the comparison
of crude rates can sometimes be inadequate, particularly when the population
structures are not comparable for factors such as age, sex or socioeconomic
level. Indeed, these and other factors influence the magnitude of crude rates
and may distort their interpretation in an effect called confounding (box 1).(1 ,2 ,3)

A confounding effect appears when the
measurement of the effect of an exposure on a risk is distorted by the
relation between the exposure and other factor(s) that also influence(s) the
outcome under study.1 Similarly, a confounding factor (or confounder) must
meet three criteria: 1) to be a known risk factor for the result of
interest,(2) 2) to be a factor associated with exposure but not a result of
exposure(2) and 3) to be a factor that is not an intermediate variable
between them. An example is that of smoking as a counfounder in the study of coffee consumption as
risk factor for ischemic heart disease. The association between coffee
consumption and ischemic heart disease may be confounded by smoking. Indeed,
smoking is a known risk factor for ischemic heart disease. It is associated
with coffee consumption as smokers are usually consumers of coffee, but it is
not a result of drinking coffee. Smoking is not an intermediate variable
between coffee consumption and ischemic heart disease. Schematically: Smoking is a confounder of the association
between coffee consumption and ischemic heart disease. Sources: (1) Last J. A Dictionnary
of Epidemiology. Fourth Edition. (2) Gordis L.
Epidemiology. Second Edition. 
The calculation of
specific rates in well defined subgroups of a
population is a way of avoiding certain confounding factors. For example,
specific rates calculated by age groups are often used to examine how diseases
affect people differently depending on their age. However, although this
uncovers the patterns of health events in the population and allows for more
rigorous comparison of rates, it can sometimes be impractical to work with a
large number of subgroups.(4) Furthermore, if the
subgroups consist of small populations, the specific rates can be very
imprecise. The process of standardization (or adjustment) of rates is a classic
epidemiological method that removes the confounding effect of variables that we
know — or think — differ in populations we wish to compare. It provides an easy
to use summary measure that can be useful for information users, such as
decisionmakers, who prefer to use synthetic health indices in their
activities.
In practice, age is
the factor that is most frequently adjusted for. Agestandardization is
particularly used in comparative mortality studies, since the age structure has
an important impact on a population’s overall mortality. For example, in
situations with levels of moderate mortality, as in the majority of the
countries of the
There are two main
standardization methods, characterized by whether the standard used is a
population distribution (direct method) or a set of specific rates (indirect
method). The two methods are presented below.
Direct method
In the direct
standardization method, the rate that we would expect to find in the
populations under study if they all had the same composition according to the
variable which effect we wish to adjust or control (such as age, socioeconomic
group, or other characteristics) is calculated. We use the structure of a population
called “standard”, stratified according to the control variable, and to which
we apply the specific rates of the corresponding strata in the population under
study. We thus obtain the number of cases “expected” in each stratum if the
populations had the same composition. The adjusted or “standardized” rate is
obtained by dividing the total of expected cases by the standard population. An
example is presented in
An important step in
the direct standardization method is the selection of a standard population.
The value of the adjusted rate depends on the standard population used, but to
a certain extent this population can be chosen arbitrarily, because there is no
significance in the calculated value itself. Indeed adjusted rates are products
of a hypothetical calculation and do not represent the exact values of the
rates. They serve only for comparisons between groups, not as a measure of
absolute magnitude. However, some aspects should be taken into account in the
selection of the standard population. The standard population may come from the
study population (sum or average for example). In this case however, it is
important to ensure that the populations do not differ in size, since a larger
population may unduly influence the adjusted rates. The standard population may
also be a population without any relation to the data under study, but in
general, its distribution with regard to the adjustment factor should not be
radically different from the populations we wish to compare.
The comparative
study of adjusted rates may be carried out in different ways: we can calculate
the absolute difference between the rates, their ratio, or the percentage
difference between them. Obviously, this comparison is valid only when the same
standard was used to calculate the adjusted rates. When the national standards
change (as in the
The direct method is
most often used. However, it requires rates specific to population strata
corresponding to the variable of interest in all the populations we wish to compare,
which are sometimes not available. Even when these specific rates are available
for all the subgroups, they are sometimes calculated from very small numbers
and can be very imprecise. In this case, the indirect
standardization method is recommended.
using the direct method, 19951997 

In this example, the standard population that was used
is the socalled “old” world standard population defined by Waterhouse (see In this example, to use the direct method we need:  The specific mortality rates by stratum of the
characteristic we want to control, in this case age, in each population (i.e.
 A standard population, stratified in the same way First we calculate the expected number of deaths in
both countries, applying the rate of each country to the standard population
(columns (4) and (5)). The sum of all the groups gives us the total of
expected deaths. To calculate the adjusted rate, we divide this
number by the total standard population. 



Agespecific mortality rate per 100,000 population,
19951997 
Expected number of deaths 


Standard population 




<1 
2,400 
1693.2 
737.8 
41 
18 
14 
9,600 
112.5 
38.5 
11 
4 
514 
19,000 
36.2 
21.7 
7 
4 
1524 
17,000 
102.9 
90.3 
17 
15 
2544 
26,000 
209.6 
176.4 
55 
46 
4564 
19,000 
841.1 
702.3 
160 
133 
65+ 
7,000 
4,967.4 
5,062.6 
348 
354 

100,000 


639 
574 
Ageadjusted mortality rate ( When eliminating the effect of the
difference in the age structure in both countries, we obtain a rate that is
higher in 

Source of the data: Pan American Health
Organization. Perfiles de mortalidad de las comunidades hermanas fronterizas México  Estados
Unidos Edición
2000 / Mortality profiles of the Sister Communities on the United
StatesMexico border 2000 Edition. Washington, D.C.: OPS. 2000 


Age groups (years) 
World 
European 
0 
2,400 
1,600 
14 
9,600 
6,400 
59 
10,000 
7,000 
1014 
9,000 
7,000 
1519 
9,000 
7,000 
2024 
8,000 
7,000 
2529 
8,000 
7,000 
3034 
6,000 
7,000 
3539 
6,000 
7,000 
4044 
6,000 
7,000 
4549 
6,000 
7,000 
5054 
5,000 
7,000 
5559 
4,000 
6,000 
6064 
4,000 
5,000 
6569 
3,000 
4,000 
7074 
2,000 
3,000 
7579 
1,000 
2,000 
8084 
500 
1,000 
85+ 
500 
1,000 
Total 
100,000 
100,000 
Source:
Waterhouse J. y 
Indirect
method
Indirect
standardization is different in both method and interpretation. An example of
adjustment using the indirect method is presented in
Standardized Mortality Ratios are frequently used in
epidemiology to compare different study groups, because they are easy to
calculate and also because they provide an estimate of the relative risk
between the standard population and the population under study. However, it is
important to know that there are instances when this comparison is not
adequate, like for example when the ratios of the rates in the groups under
study and in the population of reference are not homogeneous in the different
strata. However, the comparison between each group and the population of
reference is always relevant. The SMRs of different causes in a population may
also be calculated using a single standard.
and mortality in 

The crude mortality rate in Colombia in 1999 was 4.4
per 1,000 population, with variations between 1.8 per 1,000 population in the
department of Vichada and 6.9 per In this case, in order to use the indirect method we
need: The
agespecific mortality rates by age group in The
population of the state of Vichada stratified by age The total
number of deaths observed in the department of Vichada The first step is to calculate the expected number
of deaths in Vichada by applying the standard rates to the population of the
department (column (3) = (1) x (2)). Then the calculated deaths are summed up
and the SMR is calculated by dividing the total number of observed deaths by
the expected deaths. 


Tasas de mortalidad específica
por grupos de edad, Colombia, 1999 (i) (1) 
Población (2) 
Muertes observadas
en Vichada (3) 
Muertes esperadas en Vichada, 1999 (i) 
04 
339 
11,392 
61 
39 
514 
34 
21,930 
5 
7 
1544 
219 
38,244 
27 
84 
4564 
752 
7,083 
22 
53 
65 + 
4.573 
1,839 
27 
84 


80,488 
142 
267 
The SMR of 53% indicates that in the population
of Vichada the risk of dying is 47% less than expected according to the
mortality standards of all of 

Fuente de los datos: (i) Situación
de Salud en 
NOTE: Confidence interval for SMRs The confidence interval provides the range of values
within which we expect to find the real value of the indicator under study,
with a given probability. That way, it gives an estimate of the potential
difference between what is observed and what is really happening in the
population, which helps in interpreting the value of the observed indicator.
The 95% confidence interval is the most used. As mentioned previously, it
indicates the range of values within which we expect to find the real value
of the indicator, with a probability of 95%. In the case of the SMR, the calculation of the
confidence interval can be carried out in the following way: 1) First, the Standard Error (SE) for the SMR is
calculated using the following formula: 2) The 95% Confidence Interval (CI) is calculated as
follows: where 1.96 is the value of the Z distribution with a level of confidence of
95%. It is assumed that the values follow a normal distribution. 
In this example: SE(Vichada) = 4.4 and
CI(Vichada) (95%) = [44.4 ; 61.6] The confidence interval indicates that
we know with a probability of 95% that the SMR’s value is between 44.4 and
61.6. 
Conclusion
As with any summary
measure, adjusted rates may hide great differences between groups, which can be
of importance to explain changes in the rates due to or associated with the
variable that we wish to adjust for, for example. Nevertheless, whenever
possible it is important to analyze the specific rates along with the adjusted
rates. The two methods used in a single population should lead to the same
conclusions. If it were not the case, the situation in the different population
strata requires more indepth research.
One of the reasons
for sometimes limited use of these methods is the lack of tools or instruments
that simplify it. To respond to this need, the General Direction of Public
Health of the Xunta de Galicia and PAHO’s Special
Program for Health Analysis have developed the “EpiDat” computer package for analysis of tabulated data. EpiDat is distributed free of charge via the Internet at: http://www.paho.org/Spanish/SHA/epidat.htm. A newer version of this package will
be issued soon. The software SIGEpi (see http://www.paho.org/English/sha/be_v22n3SIGEpi.htm), which combines the capacity of a
geographic information system with epidemiological tools, also allows to
generate adjusted rates.
In short, adjusted
rates allow for more exact comparisons between populations. This is important
because it can be used in setting priorities between groups. Nevertheless, the
crude rates are the only indicators of the real dimension or magnitude of a
problem and hence remain valuable public health tools.
The following table lists standardization methods and their
corresponding location and scale measures available with the METHOD= option.
Table 59.2: Available Standardization Methods
Method 
Location 
Scale 
MEAN 
mean 
1 
MEDIAN 
median 
1 
SUM 
0 
sum 
EUCLEN 
0 
Euclidean length 
USTD 
0 
standard deviation about
origin 
STD 
mean 
standard deviation 
RANGE 
minimum 
range 
MIDRANGE 
midrange 
range/2 
MAXABS 
0 
maximum absolute value 
IQR 
median 
interquartile range 
MAD 
median 
median absolute
deviation from median 
ABW(c) 
biweight 1step Mestimate 
biweight Aestimate 
AHUBER(c) 
Huber 1step Mestimate 
Huber Aestimate 
AWAVE(c) 
Wave 1step Mestimate 
Wave Aestimate 
AGK(p) 
mean 
AGK estimate (ACECLUS) 
SPACING(p) 
mid minimumspacing 
minimum spacing 
L(p) 
L(p) 
L(p) 
IN(ds) 
read from data set 
read from data set 
For METHOD=ABW(c),
METHOD=AHUBER(c), or METHOD=AWAVE(c),
c
is a positive numeric tuning constant.
For METHOD=AGK(p),
p is a numeric constant giving the proportion of pairs to be included
in the estimation of the withincluster variances.
For METHOD=SPACING(p),
p
is a numeric constant giving the proportion of data to be contained in the
spacing.
For METHOD=L(p),
p
is a numeric constant greater than or equal to 1 specifying the power to which
differences are to be raised in computing an L(p) or Minkowski metric.
For METHOD=IN(ds),
ds is the name of a SAS data set
that meets either one of the following two conditions:
·
contains a _TYPE_ variable. The observation
that contains the location measure corresponds to the value _TYPE_= 'LOCATION' and the
observation that contains the scale measure corresponds to the value _TYPE_= 'SCALE'. You can also
use a data set created by the OUTSTAT= option from another PROC STDIZE
statement as the ds data set. See the
section "Output
Data Sets" for the contents of the OUTSTAT= data set.
·
contains the location and scale
variables specified by the LOCATION and SCALE statements.
PROC STDIZE reads in the location and
scale variables in the ds
data set by first looking for the _TYPE_
variable in the ds
data set. If it finds this variable, PROC STDIZE continues to search for all
variables specified in the VAR statement. If it does not find the _TYPE_ variable, PROC STDIZE
searches for the location variables specified in the LOCATION statement and the
scale variables specified in the SCALE statement.
For robust estimators, refer to Goodall (1983) and Iglewicz
(1983). The
MAD method has the highest breakdown point (50%), but it is somewhat
inefficient. The ABW, AHUBER, and AWAVE methods provide a good compromise
between breakdown and efficiency. The L(p)
location estimates are increasingly robust as p drops from 2
(corresponding to least squares, or mean estimation) to 1 (corresponding to
least absolute value, or median estimation). However, the L(p)
scale estimates are not robust.
The SPACING method is robust to both outliers and
clustering (Jannsen et al. 1995) and is, therefore, a
good choice for cluster analysis or nonparametric density estimation. The
midminimum spacing method estimates the mode for small p.
The AGK method is also robust to clustering and more efficient than the SPACING
method, but it is not as robust to outliers and takes longer to compute. If you
expect g clusters, the argument to
METHOD=SPACING or METHOD=AGK should be [1/g] or less. The AGK method is less biased than the SPACING method for
small samples. As a general guide, it is reasonable to use AGK for samples of
size 100 or less and SPACING for samples of size 1000 or more, with the
treatment of intermediate sample sizes depending on the available computer
resources.
Since
epidemiology is concerned with the distribution of disease in populations,
summary measures are required to describe the amount of disease in a population.
There are two basic measures, incidence and prevalence.
Incidence
is a measure of the rate at which new cases of disease occur in a population
previously without disease. Thus, the incidence, denoted by I, is defined as
The period of time is specified in the units in which the rate is expressed. Often the rate is multiplied by a base such as 1000 or 1000 000 to avoid small decimal fractions. For example, there were 280 new cases of cancer of the pancreas in men in New South Wales in 1997 out of a population of 3115 million males. The incidence was 280/3115 = 90 per million per year.
Prevalence,
denoted by P,
is a measure of the frequency of existing disease at a given time, and is defined
as
Both
incidence and prevalence usually depend on age, and possibly sex, and sex and
agespecific figures would be calculated.
The
prevalence and incidence rates are related, since an incident case is,
immediately on occurrence, a prevalent case and remains as such until recovery
or death (disregarding emigration and immigration). Provided the situation is
stable, the link between the two measures is given by
P
= It, (19.4)
where t is the average duration
of disease. For a chronic disease from which there is no recovery, t would be the average
survival after occurrence of the disease.
Problems due to
confounding arise frequently in vital statistics and have given rise to a group
of methods called standardization. We shall describe briefly one or two of the
most wellknown methods.
Mortality in a population is
usually measured by an annual death rate — for example, the number of
individuals dying during a certain calendar year divided by the estimated
population size midway through the year. Frequently this ratio is multiplied by
a convenient base, such as 1000, to avoid small decimal fractions; it is then
called the annual death rate per 1000 population. If the death rate is
calculated for a population covering a wide age range, it is called a crude
death rate.
In a comparison of the
mortality of two populations, say, those of two different countries, the crude
rates may be misleading. Mortality depends strongly on age. If the two
countries have different age structures, this contrast alone may explain a
difference in crude rates (just as, in Table 15.6, the contrast between the
‘crude’ proportions with factor A was strongly affected by the different sex
distributions in the disease and control groups). An example is given in Table
19.1 which shows the numbers of individuals and numbers of deaths separately in
different age groups, for two countries: A, typical of highly industrialized
countries, with a rather high proportion of individuals at the older ages; and
B, a developing country with a small proportion of old people. The death rates
at each age (which are called agespecific death rates)
are substantially higher for B than for A, and yet the crude death rate is
higher for A than for B.
Sometimes,
however, mortality has to be compared for a large number of different
populations, and some form of adjustment for age differences is required. For
example, the mortality in one country may have to be compared over several
different years; different regions of the same country may be under study; or
one may wish to compare the mortality for a large number of different
occupations. Two obvious generalizations are: (i) in
standardizing for factors other than, or in addition to, age—for example, sex,
as in Table 15.6; and (ii) in morbidity studies where the criterion studied is
the occurrence of a certain illness rather than of death. We shall discuss the
usual situation—the standardization of mortality rates for age.
The
basic idea in standardization is that we introduce a standard
population with a fixed age structure. The
mortality for any special population
is then adjusted to allow for discrepancies in age structure between the
standard and special populations. There are two main approaches: direct
and indirect methods of
standardization. The following brief account may be supplemented by reference
to Liddell (1960), Kalton (1968) or Hill and Hill
(1991).
The following notation will be used.
In the direct method the death rate is standardized to the
age structure of the standard population. The directly standardized death rate
for the special population is, therefore,
It is obtained by applying the special death rates, p_{i}, to the standard population sizes, N,. Alternatively, p'
can be regarded as a weighted mean of the p,, using the N,
as weights. The variance of p'
may be estimated as
where q,
= 1 — pt;
if, as is often the case, the pi
are all small, the binomial variance of pi,
piq,/ni, maybe replaced by the Poisson term pi/ni (= ri/n^^{2}), giving
To compare two special populations, A
and B, we could calculate a standardized rate for each (p'_{A} and p'_{B}), and consider
From
(19.5),
which
has exactly the same form as (15.15), with Wi
= Ni, and di
= p_{A}i
— p_{B}i
as in (15.14). The method differs from that of Cochran’s test only in using a
different system of weights. The variance is given by
with var(d,) given by (15.17). Again, when the p_{0}i are small, qo, can
be put approximately equal to 1 in (15.17).
If it is required to
compare two special populations using the ratio of the standardized rates, p'a/p'b,
then the variance of the ratio may be obtained using (19.6) and (5.12).
The variance given by
(19.7) may be unsatisfactory for the construction of confidence limits if the
numbers of deaths in the separate age groups are small, since the normal
approximation is then unsatisfactory and the Poisson limits are asymmetric. The
standardized rate (19.5) is a weighted sum of the Poisson counts, ri. Dobson et
al. (1991) gave a method of calculating an approximate
confidence interval based on the confidence interval of the total number of
deaths.
Indirect method
This method is more
conveniently thought of as a comparison of observed and expected deaths than in
terms of standardized rates. In the special population the total number of
deaths observed is ∑ri.
The number of deaths expected if the agespecific death rates were the same as
in the standard population is∑ niPi.
The overall mortality experience of the special population may be expressed in
terms of that of the standard population by the ratio of observed to expected
deaths:
When
multiplied by 100 and expressed as a percentage, (19.9) is known as the standardized
mortality ratio (SMR).
To
obtain the variance of M
we can use the result var(ri) — nipiqi, and regard the P_{i}
as constants without any sampling fluctuation (since we shall often want to
compare one SMR with another using the same standard population; in any case
the standard population will often be much larger than the special population,
and var(Pi)
will be much smaller than var(pi)).
This gives
The
smallness of the standard error (SE) of the SMR in Example 19.3 is typical of
much vital statistical data, and is the reason why sampling errors are often
ignored in this type of work. Indeed, there are problems in the interpretation
of occupational mortality statistics which often overshadow sampling errors.
For example, occupations may be less reliably stated in censuses than in the
registration of deaths, and this may lead to biases in the estimated death
rates for certain occupations. Even if the data are wholly reliable, it is not
clear whether a particularly high or low SMR for a certain occupation reflects
a health risk in that occupation or a tendency for selective groups of people
to enter it. In Example 19.3, for example, the SMR for farmers may be low
because farming is healthy, or because unhealthy
people are unlikely to enter farming or are more likely to leave it. Note also
that in the lowest age group there is an excess
of deaths among farmers (87 observed, 55 expected). Any method of
standardization carries the risk of oversimplification, and the investigator
should always compare agespecific rates to see whether the contrasts between
populations vary greatly with age.
The
method of indirect standardization is very similar to that described as the
comparison of observed and expected frequencies on p. 520. Indeed if, in the
comparison of two groups, A and B, the standard population were defined as the
pooled population A + B, the method would be precisely the same as that used in
the CochranMantelHaenszel method (p. 520). We have
seen (p. 662) that Cochran’s test is equivalent to a comparison of two direct
standardized rates. There is thus a very close relationship between the direct
and indirect methods when the standard population is chosen to be the sum of
the two special populations.
The
SMR is a weighted mean, over the separate age groups, of the ratios of the
observed death rates in the special population to those in the standard
population, with weights (niPi)
that depend on the age distribution of the special population. This means that
SMRs calculated for several special populations are not strictly comparable
(Yule, 1934), since they have been calculated with different weights. The SMRs
will be comparable under the hypothesis that the ratio of the death rates in
the special and standard populations is independent of age — that is, in a
proportionalhazards situation.
The relationship between standardization and generalized
linear models is discussed by Breslow and Day (1975),
Little and Pullum (1979) and
Freeman and Holford (1980).
Surveys to investigate associations
A question commonly asked
in epidemiological investigations into the aetiology
of disease is whether some manifestation of ill health is associated with
certain personal characteristics or habits, with particular aspects of the
environment in which a person has lived or worked, or with certain experiences
which a person has undergone. Examples of such questions are the following.
1. Is
the risk of death from lung cancer related to the degree of cigarette smoking,
whether current or in previous years?
2. Is
the risk that a child dies from acute leukaemia related
to whether or not the mother experienced irradiation during pregnancy?
3. Is
the risk of incurring a certain illness increased for individuals who were
treated with a particular drug during a previous illness?
Sometimes questions like
these can be answered by controlled experimentation in which the presumptive
personal factor can be administered or withheld at the investigator’s
discretion; in example 3, for instance, it might be possible for the
investigator to give the drug in question to some patients and not to others
and to compare the outcomes. In such cases the questions are concerned with
causative effects: ‘Is this drug a partial cause
of this illness?’ Most often, however, the experimental approach is out of the
question. The investigator must then be satisfied to observe whether there is
an association between factor and
disease, and to take the risk which was emphasized in §7.1 if he or she wishes
to infer a causative link.
These questions, then,
will usually be studied by surveys rather than by experiments. The precise
population to be surveyed is not usually of primary interest here. One reason
is that in epidemiological surveys it is usually administratively impossible
to study a national or regional population, even on a sample basis. The
investigator may, however, have facilities to study a particular occupational
group or a population geographically related to a particular medical centre. Secondly, although the mean values or relative
frequencies of the different variables may vary somewhat from one population to
another, the magnitude and direction of the associations between variables are
unlikely to vary greatly between, say, different occupational groups or
different geographical populations.
There
are two main designs for aetiological surveys — the casecontrol
study, sometimes known as a casereferent
study, and the cohort
study. In a case control study a group of individuals affected by the disease
in question is compared with a control group of unaffected individuals.
Information is obtained, usually in a retrospective way, about the frequency in
each group of the various environmental or personal factors which might be
associated with the disease. This type of survey is convenient in the study of
rare conditions which would appear too seldom in a random population sample. By
starting with a group of affected individuals one is effectively taking a much
higher sampling fraction of the cases than of the controls. The method is
appropriate also when the classification by disease is simple (particularly for
a dichotomous classification into the presence or absence of a specific
condition), but in which many possible aetiological
factors have to be studied. A further advantage of the method is that, by means
of the retrospective enquiry, the relevant information can be obtained
comparatively quickly.
In a cohort study a
population of individuals, selected usually by geographical or occupational
criteria rather than on medical grounds, is studied either by complete
enumeration or by a representative sample. The population is classified by the
factor or factors of interest and followed prospectively in time so that the
rates of occurrence of various manifestations of disease can be observed and
related to the classifications by aetiological
factors. The prospective nature of the cohort study means that it will normally
extend longer in time than the case control study and is likely to be
administratively more complex. The corresponding advantages are that many
medical conditions can be studied simultaneously and that direct information is
obtained about the health of each subject through an interval of time.
Casecontrol and cohort
studies are often called, respectively, retrospective
and prospective
studies. These latter terms are usually appropriate, but the nomenclature may
occasionally be misleading since a cohort study may be based entirely on
retrospective records. For example, if medical records are available of workers
in a certain factory for the past 30 years, a cohort study may relate to
workers employed 30 years ago and be based on records of their health in the
succeeding 30 years. Such a study is sometimes called a historical
prospective study.
A
central problem in a casecontrol study is the method by which the controls are
chosen. Ideally, they should be on average similar to the cases in all respects
except in the medical condition under study and in associated aetiological factors. Cases will often be selected from one
or more hospitals and will then share the characteristics of the population
using those hospitals, such as social and environmental conditions or ethnic
features. It will usually be desirable to select the control group from the
same area or areas, perhaps even from the same hospitals, but suffering from quite
different illnesses unlikely to share the same aetiological
factors. Further, the frequencies with which various factors are found will
usually vary with age and sex. Comparisons between the case and control groups
must, therefore, take account of any differences there may be in the age and
sex distributions of the two groups. Such adjustments are commonly avoided by
arranging that each affected individual is paired with a control individual who
is deliberately chosen to be of the same age and sex and to share any other
demographic features which may be thought to be similarly relevant.
The
remarks made in §19.2 about nonsampling errors, particularly those about
nonresponse, are also relevant in aetiological
surveys. Nonresponses are always a potential danger and every attempt should
be made to reduce them to as low a proportion as possible.
This
paper by Doll and Hill is an excellent illustration of the care which should be
taken to avoid bias due to unsuspected differences between case and control
groups or to different standards of data recording. This
study, and many others like it, strongly suggest an association between
smoking and the risk of incurring lung cancer. In such retrospective studies,
however, there is room for argument about the propriety of a particular choice
of control group, little information is obtained about the time relationships
involved, and nothing is known about the association between smoking and
diseases other than those selected for study. Doll and Hill (1954, 1956, 1964)
carried out a cohort study prospectively by sending questionnaires to all the
59 600 doctors in the UK in October 1951. Adequate replies were received from
682% of the population (34439 men and 6194 women). The male doctors were
followed for 40 years and notifications of deaths from various causes were
obtained, only 148 being untraced (Doll et
al., 1994). Some results are shown in
Table 19.4. The groups defined by different smoking categories have different
age distributions, and the death rates shown in the table have again been
standardized for age (§19.3). Cigarette smoking is again shown to be associated
with a sharp increase in the death rate from lung cancer,
there is almost as strong an association for chronic obstructive lung disease,
and a relatively weak association with the death rates from ischaemic
heart disease.
This prospective
study provides strong evidence that the association between smoking and lung
cancer is causative. In addition to the data in Table 19.4, many doctors who
smoked at the outset of the study stopped smoking during the followup period,
and by 1971 doctors were smoking less than half as much as people of the same
ages in the general population (Doll & Peto,
1976). This reduction in smoking was matched by a steady decline in the death
rate from lung cancer for the whole group of male doctors (age standardized,
and expressed as a fraction of the national mortality rate) over the first 20
years of followup.
In a cohort study in which
the incidence of a specific disease is of particular interest, the casecontrol
approach may be adopted by analysing the data of all
the cases and a control group of randomly selected noncases. This approach was
termed a synthetic retrospective study
by Mantel (1973). Often the controls are chosen matched for each case by random
sampling from the members of the cohort who are noncases at the time that the
case developed the disease (Liddell et al.,
1977), and this is usually referred to as a nested
casecontrol study. Care may be needed to avoid the
repeated selection of the same individuals as controls for more than one case
(Robins et al.,
1989). A related design is the casecohort study,
which consists of a random sample of the whole cohort; some members of this
sample will become cases and they are supplemented by the cases that occur in
the remainder of the cohort, with the noncases in the random subcohort serving as controls for the total set of cases (Kupper et al.,
1975; Prentice, 1986). These designs are useful in situations where it is
expensive to extract the whole data, or when expensive tests are required; if
material, such as blood samples, can be stored and then analysed
for only a fraction of the cohort, then there may be a large saving in resources
with very little loss of efficiency.
The measurement of the degree of association between the risk
of disease and the presence of an aetiological factor
is discussed in detail in the next section.
Subjectyears
method
A commonly used research
method is the cohort study,
in which a group is classified by exposure to some substance, followed over
time and the vital status of each member determined up to the time at which the
analysis is being conducted. A review of methods of cohort study design and
application was given by Liddell (1988). It may be possible to use existing
records to determine exposure in the past, and this gives the historical
prospective cohort study, used particularly in
occupational health research. Such studies often cover periods of over 20
years. The aim is to compare the mortality experience of subgroups, such as
high exposure with low exposure, in order to establish whether exposure to the
agent might be contributing to mortality. As such studies cover a long period
of time, individuals will be ageing and their mortality risk will be changing.
In addition, there may be period effects on mortality rate. Both the age and
period effects will need to be taken account of. One approach is the subjectyears
or personyears method,
sometimes referred to as the modified lifetable
approach; an early use of this method was by
Doll (1952). In this approach the number of deaths in the group, or in the
subgroups, is expressed in terms of the number of deaths expected if the individuals
had experienced the same death rates as the population of which the group is a
part.
The expected mortality is
calculated using published national or regional death rates. The age of each
subject, both at entry to the study and as it changes through the period of
followup, has to be taken into account. Also, since age specific death rates
depend on the period of time at which the risk occurs, the cohort of each
subject must be considered. Official death rates are usually published in
5year intervals of age and period and may be arranged as a rectangular array
consisting of cells, such as the age group 4549 during the period 197680.
Each subject passes through several of these cells during the period of
followup and experiences a risk of dying according to the years of risk in
each cell and the death rate. This risk is accumulated so long as a subject is
at risk of dying in the study—that is, until the date of death or until the end
of the followup period for the survivors. This accumulated risk is the same as
the cumulative hazard. The expected number of deaths is obtained by adding over
all subjects in the group, and it is computationally convenient to add the
years at risk in each cell over subjects before multiplying by the death rates.
This is the origin of the name of the method, since subjectyears at risk are
calculated.
The method can be applied for total deaths and also for
deaths from specific causes. The same table of subjectyears at risk is used
for each cause with different tables of death rates and, when a particular
cause of death is being considered, deaths from any other cause are effectively
treated as censored survivals. For any cause of death, the observed number is
treated as a Poisson variable with expectation equal to the expected number.
The method is similar to indirect standardization of death rates and the ratio
of observed to expected deaths is often referred to as the standardized
mortality ratio (SMR).
The method has usually been applied to compare observed and
expected mortality within single groups but may be extended to compare the SMR
between different subgroups or, more generally, to take account of covariates
recorded for each individual, by expressing the SMR as a proportionalhazards
regression model (Berry, 1983)—that is, by the use of a generalized linear
model. For subgroup i, if mi
is the cumulative hazard from the reference population and the
proportionalhazards multiplier, then , the expected
number of deaths, is given by
If is modelled in terms of a set of
covariates by
then
This
model is similar to the generalized linear model of a Poisson variable but
contains the additional term ln mi,
sometimes referred to as the offset,
which ensures that age and period are both adjusted for. An example, in which
the covariates are represented by a threefactor structure, is given in Berry
(1983).
A disadvantage of the
above approach is that it involves the assumption of proportional hazards
between the study population and the external reference population across all
the ageperiod strata, that is, that the reference death rates apply to the
study population at least to a constant of proportionality. The simplest
alternative approach that does not depend on this assumption is to work
entirely within the data set without any reference to an external population.
The death rates are calculated for each ageperiod cell from the data for the
whole cohort. These internal rates are then used to calculate the expected
numbers of deaths for subgroups defined in terms of the covariates, and so SMRs
are produced for each of the subgroups based on the observed death rates of the
whole population in each cell, instead of on external death rates (Breslow & Day, 1987, §3.5). If there are more than two
subgroups, comparison of these internal SMRs still depends on proportional
hazards of the subgroups across the age period strata, but a
proportionalhazards assumption between the study population and a reference
population is no longer necessary. This approach is known to be conservative
and may be improved by a MantelHaenszel approach (Breslow, 1984b; Breslow & Day, 1987, §3.6). This method is referred to
as internal standardization.
A
second approach is to use direct standardization with an internal subgroup as
standard. An appropriate choice for this internal reference may be an unexposed
group or the least exposed group in the study. This method produces standardized
rate ratios (SRRs), which are ratios of the
directly standardized rate for each subgroup to the rate in the standard
reference subgroup. The method avoids the possible problems associated with
comparing SMRs from more than two groups, but the SRRs may be less precisely
estimated than SMRs if the subgroups contain cells with few deaths. For a
fuller discussion and an example analysed both by the
external SMR method and the internal SRR method, see Checkoway
et al. (1989, Chapter 5).
The most comprehensive approach is that of Poisson modelling. For subgroup i
and ageperiod stratum j,
if nij
is the number of subjectyears and y_{t}j
the death rate then, , the expected number of deaths
is given by
If is modelled in terms of the
ageperiod stratum and a set of covariates by
then
This
is a generalized linear model of a Poisson variable (14.12), with the
additional term ln n_{ij}
(see (14.13)). An application is given as Example 14.4. Breslow
and Day (1987, Chapter 4) give fuller details and worked examples.
In many cases the external method (19.37) and the internal
method (19.38) will give similar inferences on the effect of the covariates.
The latter has the advantage that there is no assumption that the death rates
in all the ageperiod strata follow a proportionalhazards model with respect
to the reference population, but the precision of the comparisons is slightly
inferior to the external method. A combination of the two approaches, using
(19.37) for the overall group and main subgroups and (19.38) for regression modelling, has the advantage of estimating the effects of
the covariates and also estimating how the mortality in the overall group
compares with that in the population of which it is a part. An example where
this was done is given by Checkoway et al.
(1993).
Choice of
Standard Population
Standardized
measures describe a hypothetical state of affairs, which is a function of the
standard population chosen. For direct agestandardization, the total U.S. population
from the previous census is especially common. Since rates standardized to the
same external standard are comparable, the selection of a commonly used
standard has advantages when comparing rates across different studies.
Sometimes investigators compute directly standardized rates based upon one of
their own study populations as the standard or by combining two or more study
populations to create a standard. But rates standardized to a specific study
population are not as readily compared to rates from other studies.
When a study
involves a comparison with a "control" population, the choice of a
standard should reflect the study goals. For example, an examination of county
mortality variation within a state might compare county mortality to the state
as a whole. A clean industry may be a good standard for an industrial
population exposed to suspected occupational health hazards. Since indirectly
standardized measures require knowledge of stratumspecific rates in the
standard, data availability constrains the choice.
The choice of
a standard population is not always obvious, and there may not be a
"best" choice. For example, in
comparing syphilis rates across counties in North Carolina, Thomas et al.
(1995) decided to standardize the rates by age and sex to reduce the influence
of different agesex distributions in different counties. One obvious choice
for a set of weights was the agesex distribution of North Carolina as a whole.
However, another possible choice was to use the agesex distribution for the
U.S. as a whole, so that other investigators could more readily compare
syphilis rates in their states to the rates presented in the article. Was there
a "right" answer? In this case the choice between the two standards
could be regarded as a choice between greater "relevance" and broader
comparability. The net result makes little difference, however, since the
agesex distribution of North Carolina and the entire U.S. are very similar. In
other situations, however, the choice of standards can indeed change the
message conveyed by the results.
Just as the
growth of knowledge leads to revisions to disease classification systems,
thereby complicating comparisons across revisions, changes in the age
distribution over decades creates the dilemma of switch to a new standard
population to reflect the present reality versus retaining the existing
standard to preserve comparability across time. For this reason mortality rates
in the United States have been standardized to the 1940 population distribution
almost to the end of the 20th century. Other standards (1970, 1980) were also
in use, however, complicating comparisons of mortality statistics. During the 1990's, the U.S. National Center for Health Statistics
(NCHS/CDC) coordinated an effort among federal and state agencies to adopt the
year 2000 projected U.S. population for standardization of mortality statistics. In
August 1998 all U.S. Department of Health
and Human Services (DHHS) agencies were directed to use the 2000
Standard Population for age adjusting
mortality rates beginning no later than data year 1999 (Schoenborn
et al., 2000).
Since the age
distribution in 2000 is shifted to the right (older ages) compared to the 1940
population, mortality rates standardized to the 2000 population will be higher
than if they were standardized to the 1940 census because they will assign more
weight to older age strata, where mortality rates are high. In the same way,
comparisons (e.g., ratios) of standardized rates will reflect the situation
among older age groups more than in the past. To be sure, the switch will make
comparisons to past data problematic, though NCHS will recompute
agestandardized mortality rates for past years based on the 2000 population
standard.
The opposite
result will occur when at some point it is decided that in a global society all
countries should standardized their rates to the World population, to
facilitate comparison across countries.
Since the large majority of the world's population live in developing
countries and is much younger than the population of the U.S. and other
developed countries, standardization using a world standard will yield lower
standardized rates for most causes of death. As illustrated by the fruit stand
example in the beginning of this chapter, different standards can give
different, but correct, results. Comparisons, the usual goal of examining
rates, may be less affected then the rates themselves, as long as the patterns
(e.g., rise in mortality rate with age) are the same in the populations being
compared. When that is not the case, then the question of whether it is
meaningful to compare summary measures at all becomes more important than the
question of which weights to use.
Key concepts
Populations are
heterogeneous – they contain disparate subgroups. So any overall measure is a
summary of values for constituent subgroups. The underlying reality is the set
of rates for (ideally homogenous) subgroups.
The observed
("crude") rate is in fact a weighted average of
subgroup"specific" rates, weighted by the size of the
subgroups.
Comparability
of weighted averages depends on similarity of weights.
"Standardized"
(and other kinds of adjusted) measures are also weighted averages, with weights
chosen to improve comparability.
Crude rates are
"real", standardized rates are hypothetical.
The
"direct" method (weights taken from an external standard population)
gives greater comparability but requires more data.
The
"indirect" method (weights taken from the internal study population)
requires fewer data but provides less comparability.
Choice of
weights can affect both rates, comparisons of rates, and comparability to other
populations, so the implications of using different possible standard
populations should be
considered.
Any summary
conceals information; if there is substantial heterogeneity, the usefulness of
a summary is open to question.
References:
1.
David Machin. Medical statistics: a
textbook for the health sciences / David Machin,
Michael J. Campbell, Stephen J Walters. – John Wiley & Sons, Ltd., 2007. –
346 p.
2. Nathan
Tintle. Introduction
to statistical investigations / Nathan Tintle,
Beth Chance, George Cobb, Allan Rossman, Soma Roy,
Todd Swanson, Jill VanderStoep. – UCSD BIEB100,
Winter 2013. – 540 p.
3. Armitage P.
Statistical Methods in Medical Research / P. Armitage,
G. Berry, J. Matthews. – Blaskwell Science, 2002. – 826
p.
4.
Larry Winner. Introduction to
Biostatistics / Larry Winner. – Department of Statistics University of Florida,
July 8, 2004. – 204 p.
5.
Weiss N. A. (Neil
A.) Elementary statistics / Neil A. Weiss; biographies by Carol A. Weiss. – 8th
ed., 2012. – 774 p.