   Medicine

# Relative values

RELATIVE VALUES.

TYPES OF RELATIVE VALUES.

# Table 0.1 gives an example data file from a survey conducted by the WMHO of residents of theUnited States. The survey was conducted on a representative sample of 1,860 individuals livingin the United States in 2001 to 2002. Notice the file is organized so each observational unit (inthis case a person) occurs on a single row of the data file. For example, the first row is an 18-year old Hispanic male. The names or identifiers of the observational units are provided on theleft hand side of the table; in this case, they are ID #‘s. The number of observational units is1,860 because that is the sample size. So, if Table 0.1 was complete, it would have 1,860rows—one for each person. # As the name suggests, a variable varies, that is, it takes on different values for different cases.Depending on its values, a variable is either quantitative or categorical. For a quantitativevariable, it makes sense to do arithmetic (add, subtract, etc.) with the values. Examples areheight, weight, distance and time. For a categorical variable the values are labels for whicharithmetic does not make sense. Examples are sex, ethnicity, and eye color. The two kinds of variables lead to different kinds of summaries. For example, you can computean average value or median for a quantitative variable like height, but not for a categoricalvariable like ethnicity. Much of the rest of this section illustrates some useful summaries, butfirst, you need the key idea of a distribution. Statistics relies on looking at a lot of cases all atonce, rather than one case at a time. The key idea is the distribution of a variable: For large datasets like the WMHO survey, it is hard to detect patterns among the thousands of cases just by looking at a list of values. By thinking instead of the distribution as a whole, we are led to various ways to describe, summarize and compare distributions, much as a naturalist would describe and compare different plants or animals.

Summaries for distributions

The most common summaries for distributions are either numerical or graphical. You don‘t need a definition, because the names mean what you would expect, and you can get the idea from examples. Here are several based on the WMHO survey:

Numerical summaries, categorical variables: The proportion of females in the survey is 0.553. The proportion of Hispanics in the survey is 0.097.

Graphical summaries, categorical variable: Numerical summaries, quantitative variables:

Average age for married individuals is 52.

Average age for those who have never married is 42.

Graphical summaries, quantitative variables:

# Types of data

Just as a farmer gathers and processes a crop, a statistician gathers and processes data. For this reason the logo for the UK Royal Statistical Society is a sheaf of wheat. Like any farmer who knows instinctively the difference between oats, barley and wheat, a statistician becomes an expert at discerning different types of data. Some sections of this book refer to different data types and so we start by considering these distinctions. Figure 1.2 shows a basic summary of data types, although some data do not fi t neatly into these categories.

# STATISTICAL DISTRIBUTIONS

Every statistics book provides a listing of statistical distributions, with their properties, but browsing through these choices can be frustrating to anyone without a statistical background, for two reasons. First, the choices seem endless, with dozens of distributions competing for your attention, with little or no intuitive basis for differentiating between them. Second, the descriptions tend to be abstract and emphasize statistical properties such as the moments, characteristic functions and cumulative distributions. In this appendix, we will focus on the aspects of distributions that are most useful when analyzing raw data and trying to fit the right distribution to that data.

## Fitting the Distribution

When confronted with data that needs to be characterized by a distribution, it is best to start with the raw data and answer four basic questions about the data that can help in the characterization. The first relates to whether the data can take on only discrete values or whether the data is continuous; whether a new pharmaceutical drug gets FDA approval or not is a discrete value but the revenues from the drug represent a continuous variable. The second looks at the symmetry of the data and if there is asymmetry, which direction it lies in; in other words, are positive and negative outliers equally likely or is one more likely than the other. The third question is whether there are upper or lower limits on the data; there are some data items like revenues that cannot be lower than zero whereas there are others like operating margins that cannot exceed a value (100%). The final and related question relates to the likelihood of observing extreme values in the distribution; in some data, the extreme values occur very infrequently whereas in others, they occur more often.

### Is the data discrete or continuous?

The first and most obvious categorization of data should be on whether the data is restricted to taking on only discrete values or if it is continuous. Consider the inputs into a typical project analysis at a firm. Most estimates that go into the analysis come from distributions that are continuous; market size, market share and profit margins, for instance, are all continuous variables. There are some important risk factors, though, that can take on only discrete forms, including regulatory actions and the threat of a terrorist attack; in the first case, the regulatory authority may dispense one of two or more decisions which are specified up front and in the latter, you are subjected to a terrorist attack or you are not.

With discrete data, the entire distribution can either be developed from scratch or the data can be fitted to a pre-specified discrete distribution. With the former, there are two steps to building the distribution. The first is identifying the possible outcomes and the second is to estimate probabilities to each outcome. As we noted in the text, we can draw on historical data or experience as well as specific knowledge about the investment being analyzed to arrive at the final distribution.  This process is relatively simple to accomplish when there are a few outcomes with a well-established basis for estimating probabilities but becomes more tedious as the number of outcomes increases. If it is difficult or impossible to build up a customized distribution, it may still be possible fit the data to one of the following discrete distributions:

a. Binomial distribution: The binomial distribution measures the probabilities of the number of successes over a given number of trials with a specified probability of success in each try. In the simplest scenario of a coin toss (with a fair coin), where the probability of getting a head with each toss is 0.50 and there are a hundred trials, the binomial distribution will measure the likelihood of getting anywhere from no heads in a hundred tosses (very unlikely) to 50 heads (the most likely) to 100 heads (also very unlikely). The binomial distribution in this case will be symmetric, reflecting the even odds; as the probabilities shift from even odds, the distribution will get more skewed. Figure 6A.1 presents binomial distributions for three scenarios – two with 50% probability of success and one with a 70% probability of success and different trial sizes. Figure 6A.1: Binomial Distribution

As the probability of success is varied (from 50%) the distribution will also shift its shape, becoming positively skewed for probabilities less than 50% and negatively skewed for probabilities greater than 50%.

b. Poisson distribution: The Poisson distribution measures the likelihood of a number of events occurring within a given time interval, where the key parameter that is required is the average number of events in the given interval (l). The resulting distribution looks similar to the binomial, with the skewness being positive but decreasing with l. Figure 6A.2 presents three Poisson distributions, with l ranging from 1 to 10. Figure 6A.2: Poisson Distribution

c. Negative Binomial distribution: Returning again to the coin toss example, assume that you hold the number of successes fixed at a given number and estimate the number of tries you will have before you reach the specified number of successes. The resulting distribution is called the negative binomial and it very closely resembles the Poisson. In fact, the negative binomial distribution converges on the Poisson distribution, but will be more skewed to the right (positive values) than the Poisson distribution with similar parameters.

d. Geometric distribution: Consider again the coin toss example used to illustrate the binomial. Rather than focus on the number of successes in n trials, assume that you were measuring the likelihood of when the first success will occur. For instance, with a fair coin toss, there is a 50% chance that the first success will occur at the first try, a 25% chance that it will occur on the second try and a 12.5% chance that it will occur on the third try. The resulting distribution is positively skewed and looks as follows for three different probability scenarios (in figure 6A.3): Figure 6A.3: Geometric Distribution

Note that the distribution is steepest with high probabilities of success and flattens out as the probability decreases. However, the distribution is always positively skewed.

e.     Hypergeometric distribution: The hypergeometric distribution measures the probability of a specified number of successes in n trials, without replacement, from a finite population. Since the sampling is without replacement, the probabilities can change as a function of previous draws. Consider, for instance, the possibility of getting four face cards in hand of ten, over repeated draws from a pack. Since there are 16 face cards and the total pack contains 52 cards, the probability of getting four face cards in a hand of ten can be estimated. Figure 6A.4 provides a graph of the hypergeometric distribution: Figure 6A.4: Hypergeometric Distribution

Note that the hypergeometric distribution converges on binomial distribution as the as the population size increases.

f. Discrete uniform distribution: This is the simplest of discrete distributions and applies when all of the outcomes have an equal probability of occurring.  Figure 6A.5 presents a uniform discrete distribution with five possible outcomes, each occurring 20% of the time: Figure 6A.5: Discrete Uniform Distribution

The discrete uniform distribution is best reserved for circumstances where there are multiple possible outcomes, but no information that would allow us to expect that one outcome is more likely than the others.

With continuous data, we cannot specify all possible outcomes, since they are too numerous to list, but we have two choices. The first is to convert the continuous data into a discrete form and then go through the same process that we went through for discrete distributions of estimating probabilities. For instance, we could take a variable such as market share and break it down into discrete blocks – market share between 3% and 3.5%, between 3.5% and 4% and so on – and consider the likelihood that we will fall into each block. The second is to find a continuous distribution that best fits the data and to specify the parameters of the distribution. The rest of the appendix will focus on how to make these choices.

### How symmetric is the data?

There are some datasets that exhibit symmetry, i.e., the upside is mirrored by the downside. The symmetric distribution that most practitioners have familiarity with is the normal distribution, sown in Figure 6A.6, for a range of parameters: Figure 6A.6: Normal Distribution

The normal distribution has several features that make it popular. First, it can be fully characterized by just two parameters – the mean and the standard deviation – and thus reduces estimation pain. Second, the probability of any value occurring can be obtained simply by knowing how many standard deviations separate the value from the mean; the probability that a value will fall 2 standard deviations from the mean is roughly 95%.   The normal distribution is best suited for data that, at the minimum, meets the following conditions:

a.                 There is a strong tendency for the data to take on a central value.

b.                Positive and negative deviations from this central value are equally likely

c.                 The frequency of the deviations falls off rapidly as we move further away from the central value.

The last two conditions show up when we compute the parameters of the normal distribution: the symmetry of deviations leads to zero skewness and the low probabilities of large deviations from the central value reveal themselves in no kurtosis.

There is a cost we pay, though, when we use a normal distribution to characterize data that is non-normal since the probability estimates that we obtain will be misleading and can do more harm than good. One obvious problem is when the data is asymmetric but another potential problem is when the probabilities of large deviations from the central value do not drop off as precipitously as required by the normal distribution. In statistical language, the actual distribution of the data has fatter tails than the normal. While all of symmetric distributions in the family are like the normal in terms of the upside mirroring the downside, they vary in terms of shape, with some distributions having fatter tails than the normal and the others more accentuated peaks.  These distributions are characterized as leptokurtic and you can consider two examples. One is the logistic distribution, which has longer tails and a higher kurtosis (1.2, as compared to 0 for the normal distribution) and the other are Cauchy distributions, which also exhibit symmetry and higher kurtosis and are characterized by a scale variable that determines how fat the tails are. Figure 6A.7 present a series of Cauchy distributions that exhibit the bias towards fatter tails or more outliers than the normal distribution. Figure 6A.7: Cauchy Distribution

Either the logistic or the Cauchy distributions can be used if the data is symmetric but with extreme values that occur more frequently than you would expect with a normal distribution.

As the probabilities of extreme values increases relative to the central value, the distribution will flatten out. At its limit, assuming that the data stays symmetric and we put limits on the extreme values on both sides, we end up with the uniform distribution, shown in figure 6A.8: Figure 6A.8: Uniform Distribution

When is it appropriate to assume a uniform distribution for a variable? One possible scenario is when you have a measure of the highest and lowest values that a data item can take but no real information about where within this range the value may fall. In other words, any value within that range is just as likely as any other value.

Most data does not exhibit symmetry and instead skews towards either very large positive or very large negative values. If the data is positively skewed, one common choice is the lognormal distribution, which is typically characterized by three parameters: a shape (s or sigma), a scale (m or median) and a shift parameter ( ). When m=0 and =1, you have the standard lognormal distribution and when =0, the distribution requires only scale and sigma parameters. As the sigma rises, the peak of the distribution shifts to the left and the skewness in the distribution increases. Figure 6A.9 graphs lognormal distributions for a range of parameters: Figure 6A.9: Lognormal distribution

The Gamma and Weibull distributions are two distributions that are closely related to the lognormal distribution; like the lognormal distribution, changing the parameter levels (shape, shift and scale) can cause the distributions to change shape and become more or less skewed. In all of these functions, increasing the shape parameter will push the distribution towards the left. In fact, at high values of sigma, the left tail disappears entirely and the outliers are all positive. In this form, these distributions all resemble the exponential, characterized by a location (m) and scale parameter (b), as is clear from figure 6A.10. Figure 6A.10: Weibull Distribution

The question of which of these distributions will best fit the data will depend in large part on how severe the asymmetry in the data is. For moderate positive skewness, where there are both positive and negative outliers, but the former and larger and more common, the standard lognormal distribution will usually suffice. As the skewness becomes more severe, you may need to shift to a three-parameter lognormal distribution or a Weibull distribution, and modify the shape parameter till it fits the data. At the extreme, if there are no negative outliers and the only positive outliers in the data, you should consider the exponential function, shown in Figure 6a.11: Figure 6A.11: Exponential Distribution

If the data exhibits negative slewness, the choices of distributions are more limited. One possibility is the Beta distribution, which has two shape parameters (p and q) and upper and lower bounds on the data (a and b). Altering these parameters can yield distributions that exhibit either positive or negative skewness, as shown in figure 6A.12: Figure 6A.12: Beta Distribution

### Are there upper or lower limits on data values?

There are often natural limits on the values that data can take on. As we noted earlier, the revenues and the market value of a firm cannot be negative and the profit margin cannot exceed 100%. Using a distribution that does not constrain the values to these limits can create problems. For instance, using a normal distribution to describe profit margins can sometimes result in profit margins that exceed 100%, since the distribution has no limits on either the downside or the upside.

When data is constrained, the questions that needs to be answered are whether the constraints apply on one side of the distribution or both, and if so, what the limits on values are. Once these questions have been answered, there are two choices. One is to find a continuous distribution that conforms to these constraints. For instance, the lognormal distribution can be used to model data, such as revenues and stock prices that are constrained to be never less than zero. For data that have both upper and lower limits, you could use the uniform distribution, if the probabilities of the outcomes are even across outcomes or a triangular distribution (if the data is clustered around a central value). Figure 6A.14 presents a triangular distribution: Figure 6A.14: Triangular Distribution

An alternative approach is to use a continuous distribution that normally allows data to take on any value and to put upper and lower limits on the values that the data can assume. Note that the cost of putting these constrains is small in distributions like the normal where the probabilities of extreme values is very small, but increases as the distribution exhibits fatter tails.

### How likely are you to see extreme values of data, relative to the middle values?

As we noted in the earlier section, a key consideration in what distribution to use to describe the data is the likelihood of extreme values for the data, relative to the middle value. In the case of the normal distribution, this likelihood is small and it increases as you move to the logistic and Cauchy distributions. While it may often be more realistic to use the latter to describe real world data, the benefits of a better distribution fit have to be weighed off against the ease with which parameters can be estimated from the normal distribution. Consequently, it may make sense to stay with the normal distribution for symmetric data, unless the likelihood of extreme values increases above a threshold.

The same considerations apply for skewed distributions, though the concern will generally be more acute for the skewed side of the distribution. In other words, with positively skewed distribution, the question of which distribution to use will depend upon how much more likely large positive values are than large negative values, with the fit ranging from the lognormal to the exponential.

# Relative values

As a result of statistical research during processing of the statistical data of disease, mortality rate, lethality, etc. absolute numbers are received, which specify the number of the phenomena. Though absolute numbers have a certain cognitive values, but their use is limited. For determination of a level of the phenomenon, for comparison of a parameter in dynamics or with a parameter of other territory it is necessary to calculate  relative values (parameters, factors) which represent result of a ratio of statistical numbers between itself. The basic arithmetic action at subtraction of relative values is division.

In medical statistics themselves the following kinds of relative parameters are used:

Extensive;

— Intensive;

— Relative intensity;

— Visualization;

— Correlation.

For the determination of a structure of disease (mortality rate, lethality, etc.) the extensive parameter is used.

The extensive parameter or a parameter of distribution characterizes a parts of the phenomena (structure), that is it shows, what part from the general number of all diseases (died) is made with this or that disease which enters into total.

Using this parameter, it is possible to determine the structure of patients according to age, social status, etc. It is accepted to express this parameter in percentage, but it can be calculated and in parts per thousand case when the part of the given disease is small and at the calculation in percentage it is expressed as decimal fraction, instead of an integer.

The general formula of its subtraction is the following: Technique of the calculation of an extensive parameter will be shown on an example.

To determine an age structure of those who has addressed in a polyclinic if the following data is known:

Number of addressed — 1500 it is accepted by 100 %, number of patients of each age — accordingly for X, from here per cent of what have addressed in a polyclinic in the age of 15-19 years from the general number, will make:

Table 2.5 Age groups of people, which have visit to polyclinic

 Age group Absolute number % from the general number 15 – 19 150 10,0 20 – 29 375 25,0 30 – 39 300 20,0 40 – 49 345 23.0 50 – 59 150 10.0 60 and senior 180 12.0 In total 1500 100.0

Conclusion: most of the people that have addressed in a polyclinic were in the age of 20-29 and 40-49 years.

The extensive parameter at the analysis needs to be used carefully and we must remember that it is used only for the characteristic of structure of the phenomena in the given place and at present time. Comparison of a structure makes it possible to tell only about change of a serial number of the given diseases in structure of diseases.

If it is necessary to determine distribution of the phenomenon intensive parameters are used.

The intensive parameter characterizes frequency or distribution.

It shows how frequently the given phenomenon occurs in the given environment.

For example, how frequently there is this or that disease among the population or how frequently people are dying from this or that disease.

To calculate the intensive parameter, it is necessary to know the population or the contingent.

General formula of the calculation is the following:

phenomenon × 100 (1000; 10 000; 100 000)

environment

Intensive parameters are calculated on 1000 persons. These are parameters of birth, morbidity, mortality, etc.; on separate disease they are being calculated on 10.000 and disease, which occurs seldom — on 100000 persons.

Let' s   consider a technique of its subtraction on an example.

Example. Number of died in the area — 175, number of the population at the beginning of year — 24000, at the end of year — 26000. To determine a parameter of mortality :

General  mortality  =  number of died  during the  year  × 1000

rate                                          number of  the  population

We determine an average value of the population; for this purpose we take the number of the population to the beginning of year plus number of the population at the end of year and divide it by 2:

We make a proportion: 175 persons, who died correspond to 25000 people, and how many persons, who died correspond to 1000?

175 - 25000

X - 1000

Parameters of birth, morbidity are calculated similarly etc.

Table  2.6 Structure of morbidity, invalidity and the reasons of mortality

 Disease Structure of morbidity Structure of invalidity Structure of the reasons of death Index of relative intensity Of invalidity reasons of death Traumas 12.0 8.0 30.0 0.35 2.0 Heart and vessel diseases 4.0 27.0 19.0 6.76 4.75 Diseases of nervous system 6.0 8.0 - 1.33 - Poisonings 0.3 - 0.4 - 13.3 Tuberculosis 0.5 5.0 5.5 10.0 11.0 Other 74.2 52.0 41.5 0.7 0.56 Total 100.0 100.0 100.0 - -

Parameters of relative intensity represent a numerical ratio of two or several structures of the same elements of a set, which is studied.

They allow determining a degree of conformity (advantage or reduction) of similar attributes and are used as auxiliary reception; in those cases where it isn’t possible to receive direct intensive parameters or if it is necessary to measure a degree of a disproportion in structure of two or several close processes.

For example, there are data only about structure of the general morbidity, physical disability and mortality rate.

Comparison of these structures and subtraction of parameters of relative intensity allows finding out the relative importance of these or those diseases in health parameters of the population.

So, for example, comparison of densities of physical disability and mortality rates from cardiovascular diseases with its densities in  morbidity  allows to determine, that cardiovascular diseases occupy almost in 7 times more part in physical disability and almost in 5 times — in  mortality , than in structure of morbidity .

Procedure of the calculation of these parameters is the following:

For example, densities of cardiovascular diseases in structures:

— General morbidity                  - 4,0 %;

Disability                                       - 27,0 %;

— Reasons of mortality                     - 19,0 %.

The parameter of relative intensity of mortality is received in the similar way.

Thus, parameters of relative intensity represent parameters of a disproportion of particles of the same elements in the structure of processes, which are studying.

The parameter of correlation characterizes the relation between diverse values. For example, the parameter of average bed occupancy, nurses, etc.

The techniques of subtraction of the correlation  parameter  is the same as for intensive parameter, nevertheless the number of an intensive parameter stands in the numerator,  is included into denominator, where as in a parameter  of visualization  of numerator and denominator different.

The parameter of visualization characterizes the relation of any of comparable values to the initial level accepted for 100. This parameter is used for convenience of comparison, and also in case shows a direction of process (increase, reduction) not showing a level or the numbers of the phenomenon.

It can be used for the characteristic of dynamics of the phenomena, for comparison on separate territories, in different groups of the population, for the construction of graphic.

Table  2.7 For example. Expression of parameters of visits to polyclinic

### Polyclinic

Number of visits

Parameter of presentation =

Polyclinic   1 (100%)

№ 1

850

100,0

№ 2

920

108,1

№ 3

990

116,1

№ 4

1200

141,1

№ 5

1290

151,7

It is possible to calculate visualization parameters, using absolute numbers, intensive parameters, parameters of correlation, average values, but not extensive parameters, taking into account the above mentioned about this parameter.

It is enough to calculate   parameters with the practical purpose to within one tenth.

To determine the tenth share, it is necessary to make calculation to the second sign after a point.

Depending on, whether there will be a second sign more than five or less, the first sign after a point is determined, in the first case it increases for a unit, in the second – it remains the same.

Relative value studies are lists of "relative values" of different professional services. Antitrust enforcement agencies recently have filed several complaints against the promulgation of relative value studies by professional organizations of physicians, alleging that these lists have been used to fix and increase physician fees. This article examines the status of professionally sponsored relative values studies under antitrust law and suggests several reasons why they should be held unlawful. In particular, such relative value studies threaten to eliminate desirable competition among private third-party payers in the development of effective cost-containment strategies as well as among physicians in the setting of fees. Moreover, the alleged benefits of professionally sponsored relative value studies could be achieved by alternative means that do not similarly restrict competition in the provision of medical services.

Relative value unit (RVU), a comparable service measure used by hospitals to permit comparison of the amounts of resources required to perform various services within a single department or between departments. It is determined by assigning weight to such factors as personnel time, level of skill, and sophistication of equipment required to render patient services. RVUs are a common method of physician bonus plans based partially on productivity.

Describing and displaying categorical data Summary

This chapter illustrates methods of summarising and displaying binary and categorical data. It covers proportions, risk and rates, relative risk, and odds ratios. The importance of considering the absolute risk difference as well as the relative risk is emphasized.

Summarising categorical data

Binary data are the simplest type of data. Each individual has a label which takes one of two values. A simple summary would be to count the different types of label. However, a raw count is rarely useful. Furness et al (2003) reported more accidents to white cars than to any other colour car in Auckland, New Zealand over a 1-year period. As a consequence, a New Zealander may think twice about buying a white car! However, it turns out that there are simply more white cars on the Auckland roads than any other colour. It is only when this count is expressed as a proportion that it becomes useful. When Furness et al (2003) looked at the proportion of white cars that had accidents compared to the proportion of all cars that had accidents, they found the proportions very similar and so white cars are not more dangerous than other colours. Hence the first step to analysing categorical data is to count the number of observations in each category and express them as proportions of the total sample size. Proportions are a special example of a ratio. When time is also involved (as in counts per year) then it is known as a rate. These distinctions are given below.    Labelling binary outcomes

For binary data it is common to call the outcome ‘an event’ and ‘a non-event’. So having a car accident in Auckland, New Zealand may be an ‘event’. We often score an ‘event’ as 1 and a ‘non-event’ as 0. These may also be referred to as a ‘positive’ or ‘negative’ outcome or ‘success’ and ‘failure’. It is important to realise that these terms are merely labels and the main outcome of interest might be a success in one context and a failure in another. Thus in a study of a potentially lethal disease the outcome might be death, whereas in a disease that can be cured it might be being alive.

Comparing outcomes for binary data

Many studies involve a comparison of two groups. We may wish to combine simple summary measures to give a summary measure which in some way shows how the groups differ. Given two proportions one can either subtract one from the other, or divide one by the other.

Suppose the results of a clinical trial, with a binary categorical outcome (positive or negative), to compare two treatments (a new test treatment versus a control) are summarised in a 2 × 2 contingency table as in Table 2.3. Then the results of this trial can be summarised in a number of ways.

The ways of summarising the data presented in Table 2.3 are given below. Each of the above measures summarises the study outcomes, and the one chosen may depend on how the test treatment behaves relative to the control. Commonly, one may chose an absolute risk difference for a clinical trial and a relative risk for a prospective study. In general the relative risk is independent of how common the risk factor is. Smoking increases one’s risk of lung cancer by a factor of 10, and this is true in countries with a high smoking prevalence and countries with a low smoking prevalence. However, in a clinical trial, we may be interested in what reduction in the proportion of people with poor outcome a new treatment will make.   Summarising binary data – odds and odds ratios

A further method of summarising the results is to use the odds of an event rather than the probability. The odds of an event are defined as the ratio of the probability of occurrence of the event to the probability of nonoccurrence, that is, p/(1 − p).

Using the notation of Table 2.3 we can see that the odds of an outcome for the test group to the odds of an outcome for control group is the ratio of odds for test group to the odds for control group:

The odds ratio (OR) is The odds ratio (OR) from Table 2.3 is: When the probability of an event happening is rare, the odds and probabilities are close, because then a is much smaller than c and so a/(a + c) is approximately a/c and b is much smaller than d and so b/(b + d) is approximately b/d. Thus the OR approximates the RR when the successes are rare (say with a maximum incidence less than 10% of either pTest or pControl) Sometime the odds ratio is referred to as ‘the approximate relative risk’. The approximation is demonstrated in Table 2.5.

Why should one use the odds ratio?

The calculation for an odds ratio (OR) may seem rather perverse, given that we can calculate the relative risk directly from the 2 × 2 table and the odds ratio is only an approximation of this. However, the OR appears quite often in the literature, so it is important to be aware of it. It has certain mathematical properties that render it attractive as an alternative to the RR as a summary measure. Indeed, some statisticians argue that the odds ratio is the natural parameter and the relative risk merely an approximation. The OR features in logistic regression and as a natural summary measure for case–control studies  One point about the OR that can be seen immediately from the formula is that the OR for Failure as opposed to the OR for Success in Table 2.3 is given by OR = bc/ad. Thus the OR for Failure is just the inverse of the OR for Success.

Thus in the cannabis and psychosis study, the odds ratio of not developing psychosis for the cannabis group is 1/1.79 = 0.56. In contrast the relative risk of not developing psychosis is (1 − 0.26)/(1 − 0.16) = 0.88, which is not the same as the inverse of the relative risk of developing psychosis for the cannabis group which is 1/1.625 = 0.62.

This symmetry of interpretation of the OR is one of the reasons for its continued use.

Relative value unit

Health insurance A comparative financial unit that may sometimes be used instead of dollar amounts in a surgical schedule, this number is multiplied by a conversion factor to arrive at the surgical benefit to be paid.

The dbVar database has been developed to archive information associated with large scale genomic variation, including large insertions, deletions, translocations and inversions. In addition to archiving variation discovery, dbVar also stores associations of defined variants with phenotype information.

Archives and distributes the results of studies that have investigated the interaction of genotypes and phenotypes. Such studies include those assessing genome-wide association, medical sequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits.

Provides an open, publicly accessible platform where the HLA community can submit, edit, view, and exchange data related to the human Major Histocompatibility Complex. It consists of an interactive Alignment Viewer for HLA and related genes, an MHC microsatellite database, a sequence interpretation site for Sequencing Based Typing (SBT), and a Primer/Probe database.

Includes single nucleotide polymorphisms, microsatellites, and small-scale insertions and deletions. dbSNP contains population-specific frequency and genotype data, experimental conditions, molecular context, and mapping information for both neutral polymorphisms and clinical mutations.

A database of static NCBI web pages, documentation, and online tools. These pages include such content as specialized online sequence analysis tools, back issues of newsletters, legacy resource description pages, sample code, and other miscellaneous resources. Searching this database is equivalent to a site search tool for the whole NCBI web site. FTP site is not covered.

Open-access data generally include summaries of genotype/phenotype association studies, descriptions of the measured variables, and study documents, such as the protocol and questionnaires. Access to individual-level data, including phenotypic data tables and genotypes, requires varying levels of authorization.

A public domain quality assurance software package that facilitates the assessment of multiplex short tandem repeat (STR) DNA profiles based on laboratory-specific protocols. OSIRIS evaluates the raw electrophoresis data using an independently derived mathematically-based sizing algorithm. It offers two new peak quality measures - fit level and sizing residual. It can be customized to accommodate laboratory-specific signatures such as background noise settings, customized naming conventions and additional internal laboratory controls.

A variety of tools are available for searching the SNP database, allowing search by genotype, method, population, submitter, markers and sequence similarity using relative values. Relative Values for Integrative Healthcare

Over the years you have come to rely upon Relative Value Studies, Inc.’s expertise in providing relative value unit publications. Include Relative Values for Integrative Healthcare in our relative value publications.

Very excited about this opportunity to serve the alternative and nursing health care market. Relative Values for Integrative Healthcare will combine unit value information from RVSI with the coding nomenclature.

The best way to fully understand reimbursement and practice management is to receive step-by-step training. Whether you are a beginner or using relative values everyday, Relative Values for Integrative Healthcare will help take the guesswork out of the reimbursement process and will make office tasks easier.

The coding nomenclature provides the only patented coding system for the accurate coding of complementary and alternative medicine services. The integrative medicine codes are intended for use by heath care providers, office managers, and insurance companies, clearing houses, health care specialists and consultants.

Relative Values for Integrative Healthcare is the most useful tool available for establishing, defending and negotiating fees for complementary and alternative medicine services, products and procedures.

Relative Values for Integrative Healthcare focuses exclusively on complementary and alternative medicine and nursing services, products and procedures.

Relative Values for Integrative Healthcare includes procedures and unit values for the following professions:

Acupuncture        Homeopathy         Naturopathy

Chiropractic         Massage Therapy  Osteopathy

Holistic Medicine Midwifery    Nursing

Just consider this extensive list of features that can help you tackle the business concerns of any alternative medicine practice:

ABC codes and surveyed unit values directly related to alternative medicine, nursing and other integrative healthcare.

Establishes and analyzes alternative medicine, nursing and other integrative healthcare fees based on measurable criteria.

Step-By-Step instructions for performing practice management and reimbursement tasks (such as productivity measurement, cost of practice);

Develops defensible and justifiable fee schedules;

Simplifies negotiations with third party payers;

Based on reliable, ongoing research using qualified provider surveys;

System flexibility to keep you in control of your practice; and

Available in a format best-suited for your office needs.

The code designed specifically for integrative healthcare.

Dynamic analysis

Since program comprehension is so expensive, the development of techniques and tools that support this activity can significantly increase the overall efficiency of software development. The literature offers many such techniques: examples include execution trace analysis, architecture reconstruction, and feature location (an activity that involves linking functionalities to source code). Most approaches can be broken down into static and dynamic analyses (and combinations thereof).

Static approaches typically concern (semi-)automatic analyses of source code. An important advantage of static analysis is its completeness: a system’s source code essentially represents a full description of the system. One of the major drawbacks is that static analyses often do not capture the system’s behavioral aspects: in object-oriented code, for example, occurrences of late binding and polymorphism are difficult to grasp if runtime information is missing.

The focus of this thesis, on the other hand, is dynamic analysis, which concerns a system’s runtime execution. It is defined by Ball (1999) as “the analysis of the properties of a running software system”. A specification of the properties at hand has been purposely omitted to allow the definition to apply to multiple problem domains. Figure 1.1 shows an overview of the main steps in dynamic analyses: they typically comprise the analysis of a system’s execution through interpretation (e.g., using the Virtual Machine in Java) or instrumentation (e.g., using AspectJ (Kiczales et al., 2001)). The resulting data can be used for such purposes as reverse engineering and debugging, often in the form of execution traces. Program comprehension constitutes one such purpose, and over the years, numerous dynamic analysis approaches have been proposed in this context, with a broad spectrum of different techniques and tools as a result.

Since the definition of dynamic analysis is rather abstract, we shall elaborate on the benefits and limitations of dynamic analysis for program comprehension in particular. The advantages that we consider are:

·                  The precision with regard to the actual behavior of the software system, for example, in the context of object-oriented software software with its late binding mechanism (Ball, 1999).

·                  The fact that a goal-oriented strategy can be used, which entails the definition of an execution scenario such that only the parts of interest of the software system are analyzed (Koenemann and Robertson, 1991; Zaidman, 2006).

The drawbacks that we distinguish are:

·                  The inherent incompleteness of dynamic analysis, as the behavior or execution traces under analysis capture only a small fraction of the usually infinite execution domain of the program under study (Ball, 1999). Note that the same limitation applies to software testing.

·                  The difficulty of determining which scenarios to execute in order to trigger the program elements of interest. In practice, test suites can be used, or recorded executions involving user interaction with the system (Ball, 1999).

·                  The scalability of dynamic analysis due to the large amounts of data that may be produced by dynamic analysis, affecting performance, storage, and the cognitive load humans can deal with (Zaidman, 2006).

·                  The observer effect, i.e., the phenomenon in which software acts differently when under observation, might pose a problem in multithreaded or multi-process software because of timing issues (Andrews, 1997).

In order to deal with these limitations, many techniques propose abstractions or heuristics that allow the grouping or program points or execution points that share certain properties, which results in more high-level representations of software. In such cases, a trade-off must be made between recall (are we missing any relevant program points?) and precision (are the program points we direct the user to indeed relevant for his or her comprehension problem?).

References:

1.     David Machin. Medical statistics: a textbook for the health sciences / David Machin, Michael J. Campbell, Stephen J Walters. – John Wiley & Sons, Ltd., 2007. – 346 p.

2.     Nathan Tintle. Introduction to statistical investigations / Nathan Tintle, Beth Chance, George Cobb, Allan Rossman, Soma Roy, Todd Swanson, Jill VanderStoep. UCSD BIEB100, Winter 2013. – 540 p.

3.     Armitage P. Statistical Methods in Medical Research / P. Armitage, G. Berry, J. Matthews. – Blaskwell Science, 2002. – 826 p.

4.     Larry Winner. Introduction to Biostatistics / Larry Winner. – Department of Statistics University of Florida, July 8, 2004. – 204 p.

5.     Weiss N. A. (Neil A.) Elementary statistics / Neil A. Weiss; biographies by Carol A. Weiss. – 8th ed., 2012. – 774 p.

Oddsei - What are the odds of anything.