# GRAPHIC REPRESENTATIONS IN STATISTICS

**Graphic representation and the graphic analysis**

The graphic representations are used for evident imagination of statistical quantities they allow to analyze them deeper.

The graphic representation can be built both after absolute and after relative quantities.

Using the graphic method, it is important to know that the type of graphic representation must strictly answer the maintenance of every index.

For construction of graphic representations the following quantities are used:

__Relative ____quantities
____are: __

- intensive indices

- extensive indices

- index of correlation

- index of evidence

__Absolute quantities__

** Intensive quantities -**
4 types of diagrams:

· column

· linear

· mapgram

· mapdiagram

** Extensive quantities:**
(they characterize the structure) sector or inwardly-column diagram.

__Indices of correlation____:__
the same diagrams, that for intensive quantities (column and linear diagrams, mapgram, mapdiagram).

** Indices of evident: **the
principles of graphic representation are the same, that for intensive
quantities.

** Column diagrams **–
for illustration of homogeneous, but not interconnected indices. They represent
the static’s of the phenomena.

__Linear diagrams__**
**– for the representations of dynamics of that or
other phenomenon (a typical example is a temperature curve, change of
birth-rate, death rate level).

** Radial
diagram **– is built on the system of polar co-ordinates
of the phenomenon representations during the close cycle of time (days, week, year). For example: structure of morbidity or cause of
mortality, where in a circle every cause of mortality, depending on its percent
occupies a certain sector.

__Mapgram__**
**is the representations of statistical quantities on
a geographical map (or scheme of card).

Absolute and other indices can be marked.

__Mapdiagram__**
**is the representations of different types of
diagrams on the geographical map.

**Common
rules of construction of graphic representations:**

· every graphic representation must have the name, where its contest, time and place is mentioned;

· it must be built in a certain scale;

· for every graphic representation explanation of colored application must be given (as conditional denotations or shading).

During the choosing of graphic representations type, it’s necessary to know that it must strictly answer to the essence of the represented index.

Principles of construction and application of square diagrams (linear, column, rectangular, sector, radial).

** Linear **diagram is used for illustration of
the frequency phenomena which changes with time, that for the representations
of the phenomena dynamics.

The base of this diagram is the rectangular
system of co-ordinates. For example: on abscising axis – Х - segments are put
aside on a scale, __on a y-axis – indices of__ morbidity (х : y = 4: 3).

** Column** diagram (rectangular) is used for
illustration of homogeneous, but not connected between themselves intensive
indices. It represents dynamics or static of the phenomena.

At construction of this kind of diagram columns are drawn, the height of which must suit the quantities of the represented indices taking the scale into account. It is necessary to take into account that the wideness of all the columns and also the distance between them must be identical and arbitrary. Columns on a diagram can be vertical or horizontal. For example: growth of number of beds in permanent establishment from 1990 to 2003 year.

** Sector** diagram

**is used for illustration of extensive indices, which characterize the structure of the phenomenon, thus they give imagination about specific gravity of the phenomena in common.**

The circle is taken as 100 % (if indices are
shown in %) thus 1 % equal to 3,6^{0}
circumferences. With the help of protractor the segments, which suit the size
of an index are put aside on a circle.

For example: among all infectious diseases a measles had 28,6 % (28,6×3,60= 1030), and other infections - 71,4 % (71,4×3,60=2570).

With the help of protractor the segments, which suit the size of every index are put aside on a circle. The found points of circumference are connected with the center of circle. Separate sectors in the circle are the parts of the phenomenon, which we determine.

In place of sector it is possible to use an inwardly-column diagram. Then for 100 % the whole height of column is taken and the extensive indices are put in the proper scale units, which give, in essence, the whole one.

** Radial **diagram
is the type of the linear diagram built on polar co-ordinates.

At the construction of radial diagram in the role of abscising axis - Х is the circle divided on the identical number of parts, according to the spans of time of that or another cycle.

A y-axis is the У- radius of circle or its continuation.

So, for the radius of circumference the medial quantity of time cycle is phenomenon, which we analyze is taken. The amount of radiuses is equal to the time domains of cycle, which we study:

· 12 radiuses – at the study of the phenomena during a year

· 7 radiuses – at the study of the phenomena during a week.

The beginning of radiuses marking is accepted to begin from radius, which answers to 12 hours and to continue on a hour – hand.

Results of examinations after their statistical processing can given as graphic representations, on which numerical numbers are presented as drawing. Schedules give a general characteristic of the phenomenon and define its general laws, enable to analyze the given researches more deeply.

They facilitate comparison of parameters, give imagination about structure and character of connection between the phenomena, specify their tendencies.

Therefore, graphic demonstration we often connect with the graphic analysis for which the graphic representation serves not only means of demonstration of results and conclusions research, but also means of the analysis of the received materials, revealing of internal connections and laws.

At construction of schedules character of the data which are subject to a graphic representation, purpose of schedules (demonstration at conference, lectures, a reproduction in scientific work, etc.), the purpose of the schedule (evidently to show the received results or only to emphasize, allocate any law or the fact), a level of an audience before which the schedule is shown are taken into account.

The choice will depend on all it is the following as a graphic representation, color, the number, a proportion of a print, etc. In all cases schedules should be clear, convenient and easy for reading.

In medical statistical researches linear diagrams, plane diagrams, cartograms and linear or coordinate are used.

LINEAR DIAGRAMS are schedules on which numerical values are displayed by curves which allow to trace dynamics of the phenomenon in time or to find out dependence of one attribute on another (Fig. 2.1).

**Fig.** Age mortality rate of the population
in Ukraine(Ukrainian
Center of medical statistics, Kyiv,1999)

Whether on linear diagrams with two and a plenty of curves probably also comparison of numbers in two the greater number of dynamic lines, and also an establishment of dependence of changes of fluctuations which occurs in the other number line.

Linear diagrams are made according to system of rectangular coordinates where the horizontal scale is postponed at the left - to the right on a line of abscissas (X), and vertical - from below - upwards on a line which is called as ordinate (Y). The obligatory requirement of construction of any schedule is scale, that is the image on drawing should be reduced, compared with corresponding figures.

Contrast to linear diagrams which describe dynamics of any process, plane diagrams are used in the case when it is necessary to represent the statistical phenomena or the facts, independent one from another.

The most simple example of plane diagrams is the diagram as rectangular or figures. Digital numbers on plane diagrams average represented by geometrical figures - rectangular, squares. These diagrams are used for demonstration and popularization of the resulted data, and also in cases if it is necessary to represent structure of the phenomenon on one of the moments of supervision.

For example, age type fallen ill or structure of disease in any settlement.

**Fig.**** **Age
structure of the population diagram (the part of each age layer was determined
to all population).

In long-pillar diagrams digital numbers are represented by rectangular columns with an identical basis and different height.

The height of a rectangular corresponds to the relative value of the phenomenon which is studied. For construction a long-pillar diagram we use a scale according to which it is possible to determine the height of each column.

Long-pillar diagrams serve for comparison of several sizes. It is possible to rectangular which represent sizes, it is possible to place also on the plane diagram not on a vertical, and across and then there will be a tape diagram (Fig.4). In some cases the image of sizes as tapes (stirs) is more convenient, than as columns because it is easier to sign with each tape by a horizontal inscription.

With the aid of column and tape diagrams it is possible not only to compare different sizes, but also simultaneously to display structure of these numbers and to compare their parts. For example, long-pillar or tape diagrams which show distribution of diseases on the basic nosological forms, it is possible to show also percent of diseases among men and women.

For this purpose it is necessary (a figure or a tape) to divide each rectangular for two parts, any of which will correspond to digital number of disease among men and women.

In circular diagrams they use to display ratio of homogeneous absolute sizes.

They don’t use the area of a rectangular, but the area of a circle.

But it is necessary to remember, that the areas of circles match up one another as squares of their radiuses, therefore at construction of circular diagrams we must extract off the diagram sizes and on this basis to construct radius, and having radius, it is easy to describe a circle.

In a case if the circular diagram displays parts of the whole, it is necessary to display circles not separately one from another, and to impose against each other. The whole is possible also and its parts to submit as the circle divided on sectors - the sector diagram. At construction of the sector diagram all area of a circle is accepted for 100 %, and each sector occupies is the following part of the area which correspond to the necessary percent.

In practice for construction of sector diagrams it is possible to use not only the area of a circle, but also the area of a square and a rectangular.

Nevertheless, often it happens to divide is the following figures are harder than a circle and consequently they are rather seldom used as a basis of sector diagrams.

Radial or linear - circular diagram are constructed on the basis of number coordinates in which the radius replaces vertical scale of diagrams which are based on system of rectangular coordinates.

The example of the radial diagram is a wind rose with the aid of which we represent on maps the change of a direction of a wind during any calendar period of time (month, year).

Radial diagrams are used for an illustration of seasonal fluctuations of any numbers, for example diseases or mortality rates.

These diagrams are constructed on a circle which center has12 radiuses. Each radius saws from a circle an arch in 30 (360/12=30) also represents ordinate of one of calendar months: January, March, etc.

As an initial zero point they take the center of a circle, and then on radiuses according to the scale chosen before render numbers which display intensity of seasonal fluctuations of the phenomenon in any of calendar months.

Having connected the marked points, we receive the closed line which enables to imagine seasonal fluctuations.

When building radial diagrams, it is necessary to remember a rule of calculation of radiuses from the top part of the diagram and in other words.

**Fig.**** **The radial diagram.

Seasonal prevalence of mortality rate of the population of Kalinovsky district by Vinnitsya region (1984-1998 ,Ukraine).

Comparisons of the different phenomena according to the territorial attribute cartograms are built, if necessary. They represent geographical maps, on which with the aid of graphic symbols, where the intensity of distribution and grouping of the phenomenon (morbidity, mortality, etc.) for any period of time is shown.

Therefore they are better for building on simplified maps on which only administrative frontiers and some big settlements are shown. At construction of a cartogram the great value has grouping the phenomena which are displayed.

The most simple grouping is division of some parameters on group with parameters below average and group with parameters is higher than average. According to this division regions districts with parameters than will be shaded on a cartogram and below average - not shaded.

**Fig.**

**Regional features of mortality from cancer in Ukraine.**

** **

Graphical Representation of Data

The graphical representation of data makes the reading more interesting, less time-consuming and easily understandable. The disadvantage of graphical presentation is that it lacks details and is less accurate. In our study, we have the following graphs: 1. Bar Graphs 2. Pie Charts 3. Frequency Polygon 4. Histogram.

Bar Graphs

This is the simplest type of graphical presentation of data. The following types of bar graphs are possible: (a) Simple bar graph (b) Double bar graph (c) Divided bar graph.

Pie Graph or Pie Chart.

Sometimes a circle is used to represent a given data. The various parts of it are proportionally represented by sectors of the circle. Then the graph is called a Pie Graph or Pie Chart.

Bar Graphs and Pie Charts

Bar graphs and pie charts are commonly used to show data when the categories are qualitative. You are probably familiar with both, but let’s review the basic ideas.

Consider the essay grade data in Table 5.1. A bar graph would show each category with a bar whose length corresponded to its frequency. If you make a bar graph by hand (as opposed to with a computer), you should measure the bar lengths carefully to make sure they correctly correspond to the frequencies. In Figure 5.3, for example, the vertical axis is marked with frequencies centimeter apart. Thus, the bar for A grades is 2 centimeters long, because the frequency of A grades is 4. Note that the left side of the bar graph in Figure 5.3 is marked with frequency, while the right side is marked with relative frequency. As you can see, bar graphs make it easy to display both frequencies simultaneously.

In contrast, pie charts are used primarily for relative frequencies, because the total pie must always represent the total relative frequency of 100%. The size of each wedge is proportional to the relative frequency of the category it represents. Figure 5.4 shows a pie chart for the essay grade data. To make comparisons easier, relative frequencies are often written on pie chart wedges.

Nowadays, most people make graphs with the aid of computers that measure bar lengths or wedge sizes automatically. However, you must still specify any labels or axis marks you want on a graph. This labeling is extremely important: Without proper labels, a graph is meaningless. The following summary lists the important labels for graphs. Of course, not all labels are necessary in all cases. For example, pie charts do not require a vertical or horizontal scale. Notice how these rules were applied in Figure 5.3.

Frequency Polygon

In a frequency distribution, the mid-value of each class is obtained. Then on the graph paper, the frequency is plotted against the corresponding mid-value. These points are joined by straight lines. These straight lines may be extended in both directions to meet the X - axis to form a polygon.

Histogram

A two dimensional frequency density diagram is called a histogram. A histogram is a diagram which represents the class interval and frequency in the form of a rectangle.

In a simple bar graph, the height of each bar represents the frequency. The thickness has no significance. All bars to have the same thickness.

We use double bar graph when we want to compare two things.

In the frequency polygon, the frequency is plotted against the mid value of each class. These points are joined by line segments.

The scientific methods of collection of data, its classification and application to commerce and everyday life is called statistics. A list of some important terms as follows: ungrouped data, tabulation of data, range, frequency, frequency distribution tally, inclusive type of grouped frequency distribution, exclusive type of grouped frequency distribution, lower limit and actual lower limit, upper limit and actual upper limit class size or class width class mark or class mid-interval. Variables, Continuous Variables (xv) Discrete Variables.

Graphical Representation

There are various methods of graphical representation of statistical data. In our study, we learn two types. Histogram Ogive or Cumulative Frequency Curve.

Cumulative Frequency

Cumulative frequency is obtained by adding the frequency of a class interval and the frequencies of the preceding intervals up to that class interval.

Cumulative Frequency Curve

A plot of the cumulative frequency against the upper class boundary with the points joined by line segments. Any continuous cumulative frequency curve, including a cumulative frequency polygon, is called an ogive. There are two ways of constructing an ogive or cumulative frequency curve. The curve is usually of shape.

A histogram is a diagram which represents the class interval and frequency in the form of a rectangle. The cumulative frequency curve is a shaped curve. Points on the cumulative frequency curve have abscissas as the actual upper / lower limits for 'less than' / more than curve and ordinates as the cumulative frequencies.

GRAPHICAL REPRESENTATION OF DATA

Graphical representation is done of the data available this being a very important step of statistical analysis. We will be discussing the organization of data. The word 'Data' is plural for 'datum'; datum means facts. Statistically the term is used for numerical facts such as measures of height, weight and scores on achievement and intelligence tests.

Tests, experiments and surveys in education and psychology provide us valuable data, mostly in the shape of numerical scores. For understanding data available and deriving meaning and useful conclusion, the data have to be organized or arranged in some systematic way. This can be done by following ways:

1. Statistical tables

2. Rank order

3. Frequency distribution

Statistical tables

The data are tabulated or arranged into rows and columns of different heading. Such tables can list original raw scores as well as the percentages, means, standard deviations and so on.

Rules for constructing tables:

1. Title of the table should be simple, concise and unambiguous. As a rule, it should appear on the table.

2. The table should be suitably divided into columns and rows according to the nature of data and purpose. These columns and rows should be arranged in a logical order to facilitate comparison.

3. The heading of each columns or row should be as brief as possible. Two or more columns or rows with similar headings may be grouped under a common heading to avoid repetition and we may have subheadings or captions.

4. Sub total for each separate classification and a general total for all combined classes are to be given. These totals should be given at the bottom or right of the concerned items.

5. The units in which the data are given must invariably be mentioned.

6. Necessary footnotes should be providing essential explanation of the points to ambiguous representation of the tabulated data must be given at the bottom of the table.

7. The sources from where the data have been received should be given at the end of the table.

9. If the numbers tabulated have more than three significant figures, the digit should be grouped in threes. For ex.- 4394756 as 4 394 756.

10. For all purposes and by all means, the table should be as simple as possible so that it may be studied by the readers with minimum possible strain and create a clear picture and interpretations of the data.

Rank order

The original raw scores can be arranged in an ascending or a descending series exhibiting an order with respect to the rank or merit position of the individual. Example:

Sixteen students of BA final psychology class obtained the following scores on an achievement test. Tabulating the given data -

5 8 4 12 15 17 18 12 20 7 8 19 6 9 10 11

S. No. Scores S No. Scores S No. Scores S No. Scores

1 20 5 15 9 10 13 7

2 19 6 12 10 9 14 6

3 18 7 12 11 8 15 5

4 17 8 11 12 8 16 4

Frequency Distribution

The organization of the data according to rank order does not help us to summarize a series of raw scores. It also does not tell us the frequency of the raw scores. In frequency distribution we group the data into an arbitrarily chosen groups or classes. It is also seen that how many times a particular score or group of scores occurs in the given data. This is known as the frequency distribution of numerical data.

Construction of Frequency distribution table

Finding the range:

First of all the range of the series to be grouped is found. it is done by subtracting the lowest score from the highest. In the present problem the range of the distribution is 46-12, 34.

Determining class interval:

After finding range we find class interval represented by Y. The formula for this is:

Writing the contents of the frequency distribution table:

Writing the classes of the distribution.

In the first column we write the classes of distribution. First of all the lowest class is settled and afterwards other subsequent classes are written down. In this case we take 10-14 as the lowest class, then wee have higher classes as 15-19, 20-24,.. and so on up to 45-49.

Tallying the scores into proper classes.

The scores given are tallied into proper classes in the second column then the tallies are counted against each class to obtain the frequency of the class.

GRAPHICAL REPRESENTATION OF DATA

The statistical data may be presented in a more attractive form appealing to the eye with the help of some graphic aids, i.e. Pictures and graphs. Such presentation carries a lot of communication power. A mere glimpse of thee picture and graphs may enable the viewer to have an immediate and meaningful grasp of the large amount of data.

Ungrouped data may be represented through a bar diagram, pie diagram, pictograph and line graph.

Bar graph represents the data on the graph paper in the form of vertical or horizontal bars.

In a pie diagram, the data is represented by a circle of 360degrees into parts, each representing the amount of data converted into angles. The total frequency value is equated to 360 degrees and then the angle corresponding to component parts are calculated.

In pictograms, the data is represented by means of picture figures appropriately designed in proportion to the numerical data.

Line graphs represent the data concerning one variable on the horizontal and other variable on the vertical axis of the graph paper.

Grouped data may be represented graphically by histogram, frequency polygon, cumulative frequency graph and cumulative frequency percentage curve or ogive.

A histogram is essentially a bar graph of a frequency distribution. The actual class limits plotted on the x-axis represents the width of various bars and respective frequencies of these class intervals represent the height of these bars.

A frequency polygon is a line graph for the graphical representation of frequency distribution.

A cumulative frequency graph represents the cumulative frequency distribution by plotting actual upper limits of the class intervals on the x axis and the respective cumulative frequencies of these class intervals on the y axis.

Cumulative frequency percentage curve or ogive represents cumulative percentage frequency distribution by plotting upper limits of the class intervals on the x axis and the respective cumulative percentage frequencies of these class intervals on the y axis.

METHOD FOR CONSTRUCTING

A HISTOGRAM

1. The scores in the form of actual class limits as 19.5-24.5, 24.5-29.5 and so on are taken as examples in the construction of a histogram rather than written class limits as 20-24, 25-30.

2. It is customary to take two extra intervals of classes one below and above the grouped intervals.

3. Now we take the actual lower limits of all the class intervals and try to plot them on the x axis. The lower limit of the lowest class interval is taken at the intersecting point of x axis and y axis.

4. Frequencies of the distribution are plotted on the y axis.

5. Each class interval with its specific frequency is represented by separate rectangle. The base of each rectangle is the width of the class interval. And the height is representative of the frequency of that class or interval.

6. Care should be taken to select the appropriate units of representation along the x and y axis. Both the axis and the y axis must not be too short or too long.

For
quantitative data categories, the two most common types of graphics are *histograms
*and *line charts. *Figure 5.9a shows a
histogram for the binned exam data of Table 5.3. Figure 5.9b
shows a line chart for the same data.

A **histogram **is
essentially a bar graph in which the data categories are quantitative. Thus,
the bars on a histogram must follow the natural order of the numerical
categories. In addition, the widths of histogram bars have a specific meaning.
For example, the width of each bar in Figure 5.9a
represents 5 points on the exam. Because there are no gaps between the
categories, the bars on a histogram touch each other.

A **line chart
**serves the same basic purpose as a histogram, but instead of using bars, a
line chart connects a series of dots. When data are binned, the dot is placed
at the center of each bin. Histograms and line charts are often used to show
how some variable changes with time. For example, the line chart in Figure 5.10
shows how the U.S. homicide rate has changed with time. The categories are time
intervals. In this case, each bin represents a year in the data. Histograms and
line charts with time on the horizontal axis are often called **time-series
diagrams.**

METHOD FOR CONSTRUTING A FREQUENCY POLYGON

1. As in histogram two extra class interval is taken, one above and other below the given class interval.

2. The mid-points of the class interval is calculated.

3. The mid point is calculated along the x axis and the corresponding frequencies are plotted along the y axis.

4. The various points given by the plotting are joined by lines to give frequency polygon.

DIFFERENCE BETWEEN HISTOGRAM AND FRQUENCY POLYGON

Histogram is a bar graph while frequency polygon is a line graph. Frequency polygon is more useful and practical. In frequency polygon it is easy to know the trends of the distribution; we are unable to do so in histogram. Histogram gives a very clear and accurate picture of the relative proportion of the frequency from interval to interval.

METHOD FOR CONSTRUTING

A CUMULATIVE FREQUENCY GRAPH

1. First of all we calculate the actual upper and lower limits of the class intervals i.e. if the class interval is 20-24 then upper limit is 24.5 and the lower limit is 19.5.

2. We must know select a suitable scale as per the range of the class interval and plot the actual upper limits on the x axis and the respective cumulative frequency on y axis.

3. All the plotted points are then joined by successive straight lines resulting a line graph.

4. To plot the origin of the x axis an extra class interval is taken with cumulative frequency zero is taken.

Statistics is that branch of mathematics devoted to the collection, compilation, display, and interpretation of numerical data. In general, the field can be divided into two major subgroups, descriptive statistics and inferential statistics. The former subject deals primarily with the accumulation and presentation of numerical data, while the latter focuses on predictions that can be made based on those data.

Perhaps the simplest way to report the results of the study described above is to make a table. The advantage of constructing a table of data is that a reader can get a general idea about the findings of the study in a brief glance.

Two fundamental concepts used in statistical analysis are population and sample. The term population refers to a complete set of individuals, objects, or events that belong to some category. For example, all of the players who are employed by Major League Baseball teams make up the population of professional major league baseball players. The term sample refers to some subset of a population.

Statistics - Collecting Data

Statistics - Graphical Representation

The table
shown above is one way of representing the frequency distribution of a sample
or population. A frequency distribution is any method for summarizing data that
shows the number of individuals or individual cases present in each given
interval of measurement. In the table above, there are 5,382,025 female
African-Americans in the age group 0-19;

Statistics - Distribution Curves

Finally, think of a histogram in which the vertical bars are very narrow...and then very, very narrow. As one connects the midpoints of these bars, the frequency polygon begins to look like a smooth curve, perhaps like a high, smoothly shaped hill. A curve of this kind is known as a distribution curve. Probably the most familiar kind of distribution curve is one with a peak in the middle.

Statistics

Other Kinds Of Frequency Distributions

Bar graphs look very much like histograms except that gaps are left between adjacent bars. This difference is based on the fact that bar graphs are usually used to represent discrete data and the space between bars is a reminder of the discrete character of the data represented. Line graphs can also be used to represent continuous data. If one were to record the temperature once an hour all day to week.

Statistics - Measures Of Central Tendency

Both statisticians and non-statisticians talk about "averages" all the time. But the term average can have a number of different meanings. In the field of statistics, therefore, workers prefer to use the term "measure of central tendency" for the concept of an "average." One way to understand how various measures of central tendency.

Measures Of Variability

Suppose that a teacher gave the same test to two different classes and obtained the following results: Class 1: 80%, 80%, 80%, 80%, 80% Class 2: 60%, 70%, 80%, 90%, 100% If you calculate the mean for both sets of scores, you get the same answer: 80%. But the collection of scores from which this mean was obtained was very different in the two cases. The way that statisticians have of distinguishing…

Statistics - Inferential Statistics

Expressing a collection of data in some useful form, as described above, is often only the first step in a statistician's work. The next step will be to decide what conclusions, predictions, and other statements, if any, can be made based on those data. A number of sophisticated mathematical techniques have now been developed to make these judgments. An important fundamental concept used in biostatistics.

Computer forensics is the preservation, analysis, and interpretation of computer data. There is a need for software that aids investigators in locating data on hard drives left by persons committing illegal activities. These software tools should reduce the tedious efforts of forensic examiners, especially when searching large hard drives. A method is proposed here that uses visualization techniques to represent file statistics, such as file size, last access date, creation date, last modification date, owner, and file type. The user interface to this software allows file searching, pattern matching, and display of file contents. By viewing file information graphically, the developed software will reduce the examiner’s analysis time and greatly increase the probability of locating criminal evidence.

Computer forensics is the preservation, analysis, and interpretation of computer data. In a world wherein the number of crimes committed using computers is increasing rapidly, a definite need exists for forensic software tools. These tools allow investigators to follow digital tracks left by persons committing illegal activities. Traces of evidence may be found in plain text documents, log files, or even system files, yet more technologically advanced criminals may conceal information by deleting it, encrypting it, or embedding it inside another file. With the large amount of storage space available on modern hard drives, searching for a single file becomes quite tedious without the help of special forensic tools. Using visualization techniques to display information about computer data can help forensic specialists direct their search to suspicious files.

A great deal of time is wasted trying to interpret mass amounts of data that is not correlated or meaningful without high levels of patience and tolerance for error. A well quoted phrase, “a picture is worth a thousand words,” is what we’re trying to accomplish here. Human brains have the ability to interpret and comprehend pictures, video, and charts much faster than reading a description of the same. This is because the human mind is able to examine graphics in parallel but only examine text in serial. Imagine a friend trying to describe in an email the beauty of the Shenandoah Valley using the best vocabulary he has. It takes some time because there are so many elements to convey without misrepresenting the Shenandoah Valley as another valley full of green trees. Eventually, the friend decides it is best just to show a picture taken from a scenic overlook. You are amazed at the beauty and realize it would have taken thousands of words to describe it all.

One single picture not only presented an accurate representation of the Shenandoah Valley but saved you reading a very long email. Using this concept of visual perception, we have developed a graphical user interface (GUI) that displays file information visually. The user is able to query a specific directory to query and see statistics, such as file size, access date, creation date, modification date, owner, and file type, represented by pixel intensity or colour, wherein each pixel represents a file.

Requests for more information about a suspect file can be filled by clicking on the display and walking through various menus. Viewing information about multiple files or understanding the relationship between them is also helpful. The user interface to this software allows file searching, pattern matching, and display of file contents. Each of these options allows a deeper analysis of the data stored on the hard drive and results in a flexible and customizable tool for locating criminal evidence.

The software tool we have developed will greatly aid the computer forensic process by reducing the time to identify suspicious files and increasing the probability of locating criminal evidence. This is done by using a graphical representation of the file rather than traditional text.

Our contributions to computer science include the use of enhanced tree-maps, applied visualization techniques for computer forensics, and a software framework on which to build future enhancements. Enhanced tree-maps help represent temporal information about files, such as access time. Traditional tree-maps only have the capability of representing spatial information, such as size. The first to apply visualization techniques to computer forensics and will show it to be a promising method for identifying hidden or altered files. Lastly, our software allows for additional visualization techniques not yet developed.

Documentation

During the analysis process, detailed information must be recorded if there is to be any hope of a successful court appearance. This information includes forensic tools used, actions taken, and chain of custody. Some forensic tools have more credibility in court than others because they have been proven. Thus, it is important to use a proven forensic tool. Actions taken include opening files and hashing. Time of day should be recorded whenever a file is opened, hashed, or scanned, along with the directory it was discovered in. Every examiner involved in the case needs to be recorded in the chain of custody. At any time in the investigation, it should be clear and possible to identify the individual who carried out an analysis task.

Court Appearance

Once the evidence has been analyzed, authenticated, and documented, it may go to court. It is important to present the case in a simple and clear manner because judges and juries may not have technical knowledge of computer systems. Investigators who have followed the forensic process will have a higher probability of winning the case. However, if there are holes in the chain of custody or any step of the forensic process, the defence will exploit them and usually succeed at convincing the jury the investigation was handled improperly.

The prosecution, thus, would not be able to rebuild their case and would loose. An understanding of the computer forensic process leads to the development of improved software that aids investigators in locating evidence. Any software used to collect or analyze evidence must follow the computer forensic guidelines; otherwise, its use becomes a hindrance rather than a benefit.

Visualization of Data

Tree-Maps

In our method, we use visualization techniques to help represent file attributes. One method of displaying the relationship of files visually in two dimensions is called a tree map. Schneiderman describes tree-maps as 2D space-filling algorithm for complex tree structures. They are designed to display the entire tree structure in one screen. Each file is represented by a shaded box that adheres to a chosen colouring scheme that highlights file and directory boundaries. Box size is determined by two parameters: the size of the user selected display region and percentage of the selected directory the file occupies. Other file directory representations like that of Windows Explorer use nodes and edges rotated on their side and always require scrolling up and down to view the complex structure.

The tree-map facilitates easy recognition of the largest files because they take up the most space in the 2D display. The method of using tree-maps to visualize data storage and directory structure greatly reduces the time it takes to locate large files in a tree structure that is nine levels deep and contains many thousands of files. Tree-maps are primarily designed to emphasize large files. However, Schneiderman does point out that a user can drag a mouse over the display and click on a shaded box to query the system for the file name or other information. Such additions may enhance the usefulness of tree-maps, but stand-alone tree-maps for computer forensics contain many weaknesses. Small files and directories are hidden among larger files and may not even show up on the display. We may be looking for a simple file on a massive hard drive. If the file is small or if the disk contains numerous files, our file will hardly stand out.

For our purposes, stand-alone tree-maps require enhancement that provides the user with advanced filtering and display techniques. In this way, tree-maps are interesting and provide groundwork for opportunities in computer forensics

"Graphic representation of statistics" Videos

Graphic representation of statistics Questions & Answers

Question: NAME THE DIFFERENT GRAPHICAL REPRESENTATION OF DATA USE IN STATISTICS

Answer: graphical representation of statistical data is for the sole purpose of easier interpretation. in modern manufacturing it has been converted to 'statistical process control' which sprung the 'seven QC tools' and was recently upgraded the seven QC tools: flow charts run charts paretic diagram histogram cause effect diagrams scatter diagrams control chart (the most famous and widely used) the new version: affinity diagrams relations diagrams tree diagram matrix diagram arrow diagram process decision program charts matrix data analysis just type any of the key words i put in here in your search engines and you'll have better explanations about them good luck

Question: Based on your observation list out 4 points on the characteristics of logarithmic or exponential functions and their graphical representation.

Answer: they are mirror images of each other. That’s one.

Question: Working with Numbers Number Operations and Number Sense Simple Algebra Algebra, Functions, and Patterns Geometry and Graphing Measurement, Geometry and Coordinate Geometry or lead me to a site. 4. Statistical Math Data analysis, reading graphical representations of data Statistics and probability

Answer: I'm not trying to just get points, but no one can help you with this. you have to have real problems, because these subjects are so broad that it would be impossible to cover these even simply without talking an hour.

Results of examinations after their statistical processing can given as graphic representations, on which numerical numbers are presented as drawing. Schedules give a general characteristic of the phenomenon and define its general laws, enable to analyze the given researches more deeply.

They facilitate comparison of parameters, give imagination about structure and character of connection between the phenomena, specify their tendencies.

Therefore, graphic demonstration we often connect with the graphic analysis for which the graphic representation serves not only means of demonstration of results and conclusions research, but also means of the analysis of the received materials, revealing of internal connections and laws.

At construction of schedules character of the data which are subject to a graphic representation, purpose of schedules (demonstration at conference, lectures, a reproduction in scientific work, etc.), the purpose of the schedule (evidently to show the received results or only to emphasize, allocate any law or the fact), a level of an audience before which the schedule is shown are taken into account.

The choice will depend on all it is the following as a graphic representation, color, the number, a proportion of a print, etc. In all cases schedules should be clear, convenient and easy for reading.

In medical statistical researches linear diagrams, plane diagrams, cartograms and linear or coordinate are used.

LINEAR DIAGRAMS are schedules on which numerical values are displayed by curves which allow to trace dynamics of the phenomenon in time or to find out dependence of one attribute on another.

Whether on linear diagrams with two and a plenty of curves probably also comparison of numbers in two the greater number of dynamic lines, and also an establishment of dependence of changes of fluctuations which occurs in the other number line.

Linear diagrams are made according to system of rectangular coordinates where the horizontal scale is postponed at the left – to the right on a line of abscissas (X), and vertical – from below – upwards on a line which is called as ordinate (Y). The obligatory requirement of construction of any schedule is scale, that is the image on drawing should be reduced, compared with corresponding figures.

Contrast to linear diagrams which describe dynamics of any process, plane diagrams are used in the case when it is necessary to represent the statistical phenomena or the facts, independent one from another.

The most simple example of plane diagrams is the diagram as rectangular or figures. Digital numbers on plane diagrams average represented by geometrical figures – rectangular, squares. These diagrams are used for demonstration and popularization of the resulted data, and also in cases if it is necessary to represent structure of the phenomenon on one of the moments of supervision.

For example, age type fallen ill or structure of disease in any settlement.

**Fig.**** **Age structure
of the population (the part of each age layer was determined to all
population).

In long-pillar diagrams digital numbers are represented by rectangular columns with an identical basis and different height.

The height of a rectangular corresponds to the relative value of the phenomenon which is studied. For construction a long-pillar diagram we use a scale according to which it is possible to determine the height of each column.

Long-pillar diagrams serve for comparison of several sizes. It is possible to rectangular which represent sizes, it is possible to place also on the plane diagram not on a vertical, and across and then there will be a tape diagram (Fig.4). In some cases the image of sizes as tapes (stirs) is more convenient, than as columns because it is easier to sign with each tape by a horizontal inscription.

With the aid of column and tape diagrams it is possible not only to compare different sizes, but also simultaneously to display structure of these numbers and to compare their parts. For example, long-pillar or tape diagrams which show distribution of diseases on the basic nosological forms, it is possible to show also percent of diseases among men and women.

For this purpose it is necessary (a figure or a tape) to divide each rectangular for two parts, any of which will correspond to digital number of disease among men and women.

In circular diagrams they use to display ratio of homogeneous absolute sizes.

They don’t use the area of a rectangular, but the area of a circle.

But it is necessary to remember, that the areas of circles match up one another as squares of their radiuses, therefore at construction of circular diagrams we must extract off the diagram sizes and on this basis to construct radius, and having radius, it is easy to describe a circle.

In a case if the circular diagram displays parts of the whole, it is necessary to display circles not separately one from another, and to impose against each other. The whole is possible also and its parts to submit as the circle divided on sectors – the sector diagram. At construction of the sector diagram all area of a circle is accepted for 100 %, and each sector occupies is the following part of the area which correspond to the necessary percent.

In practice for construction of sector diagrams it is possible to use not only the area of a circle, but also the area of a square and a rectangular.

Nevertheless, often it happens to divide is the following figures is more hard, than a circle and consequently they are rather seldom used as a basis of sector diagrams.

Radial or linear – circular diagram are constructed on the basis of number coordinates in which the radius replaces vertical scale of diagrams which are based on system of rectangular coordinates.

The example of the radial diagram is a wind rose with the aid of which we represent on maps the change of a direction of a wind during any calendar period of time (month, year).

Radial diagrams are used for an illustration of seasonal fluctuations of any numbers, for example diseases or mortality rates.

These diagrams are constructed on a circle which center has 12 radiuses. Each radius saws from a circle an arch in 30 (360/12=30) also represents ordinate of one of calendar months: January, March, etc.

As an initial zero point they take the center of a circle, and then on radiuses according to the scale chosen before render numbers which display intensity of seasonal fluctuations of the phenomenon in any of calendar months.

Having connected the marked points, we receive the closed line which enables to imagine seasonal fluctuations.

When building radial diagrams, it is necessary to remember a rule of calculation of radiuses from the top part of the diagram and in other words.

**Fig.**** **The
radial diagram.

Comparisons of the different phenomena according to a territorial attribute cartograms are built, if necessary. They represent geographical maps, on which with the aid of graphic symbols where the intensity of distribution and grouping of the phenomenon ( morbidity, mortality, etc.) for any period of time ( Fig. 2.4) is shown.

Therefore they are better for building on simplified maps on which only administrative frontiers and some big settlements are shown. At construction of a cartogram the great value has grouping the phenomena which are displayed.

The most simple grouping is division of some parameters on group with parameters below average and group with parameters is higher than average. According to this division regions districts with parameters than will be shaded on a cartogram and below average – not shaded.

**Graphics in the Media**

Now that we’ve discussed basic types of statistical graphs, we are ready to explore some of the fancier graphics that appear daily in the news. We will also discuss several cautions to keep in mind when interpreting media graphics.

Graphics Beyond the Basics

Many graphical displays of data go beyond the basic types. Here, we explore a few of the types that are most common in the news media

**Multiple Bar Graphs **

A **multiple bar graph **is
a simple extension of a regular bar graph. It has two or more sets of bars that
allow comparison between two or more data sets. All the data sets must involve
the same categories so that they can be displayed on the same graph. For
example, Figure 5.15 is a multiple bar graph showing trends in home computing.
The categories are years. The two sets of bars represent two different measures
of home computing: ownership of personal computers and connection to the
Internet. Note that a legend clearly identifies the two sets of bars.

**EXAMPLE 1 ***Computing
Trends *

Summarize two major trends shown in Figure 5.15.

**SOLUTION **The most
obvious trend is that both data sets show an increase with time. That is, the
number of homes with computers and the number of online homes both increased
with time. We see a second trend by comparing the bars within each year. In 1995, the
number of online homes (about 10 million) was less than one-third the number of homes with computers (about 33 million). By 2003,
the number of online homes (about 62 million) was about 90% of the number of
homes with computers (about 70 million). This tells us that a higher percentage
of computer users are going online.

**Stack Plots **

Another common type of
graph, called a **stack plot, **shows different data sets in a vertical
stack. Figure 5.16 uses a stack plot to show trends in death rates (deaths per
100,000 people) for four diseases since 1900. Each disease has its own
color-coded region, or wedge; note the importance of the legend. The *thickness
*of a wedge at a particular time tells you its value at that time: When a
wedge is thick it has a large value, and when it is thin it has a small value.

**EXAMPLE 2 ***Stack Plot *

Based on Figure 5.16, what was the death rate for cardiovascular disease in 1980? Discuss the general trends visible on this graph.

**SOLUTION **For 1980, the
cardiovascular wedge extends from about 180 to 620 on the vertical axis, so its
thickness is about 440. Thus, the death rate in 1980 for cardiovascular disease
was about 440 deaths per 100,000 people. The graph shows several important
trends. First, the downward slope of the top wedge shows that the overall death
rate from these four diseases decreased substantially, from nearly 800 deaths
per 100,000 in 1900 to about 525 in 2003. The drastic decline in the thickness
of the tuberculosis wedge shows that this disease was once a major killer, but
has been nearly wiped out since 1950. Meanwhile, the cancer wedge shows that the death
rate from cancer rose steadily until the mid-1990s,
but has dropped somewhat since then.

**Graphs of Geographical
Data **

We are often interested in geographical patterns in data. Figure 5.17 shows one common way of displaying geographical data. In this case, the data on per capita (per person) income are shown state by state. The legend explains that different colors represent different income levels. Similar colors are used for similar income levels. Thus, it is easy to see that income levels tend to be highest in the northeast and lowest in the south.

The display in Figure 5.17 works well
because each state is associated with a unique income level. For data that vary
continuously across geographical areas, a **contour map **is more
convenient. Figure 5.18 shows a contour map of temperature over the United
States at a particular time. Each of the *contours *connects locations
with the same temperature. For example, the temperature is 50°F
everywhere along the contour labeled 50° and 60°F
everywhere along the contour labeled 60°F. Between
these two contours, the temperature is between 50°F
and 60°F. Note that in regions where contours are
tightly spaced, there are greater temperature changes. For example, the closely
packed contours in the northeast indicate that the temperature varies
substantially over small distances. To make the graph easier to read, the
regions between adjacent contours are color-coded.

**Three-Dimensional Graphics
**

Today, computer software
makes it easy to give almost any graph a three-dimensional appearance. For
example, Figure 5.19 shows the bar graph of Figure 5.3, but “dressed up” with a
three-dimensional look. It may look nice, but the three-dimensional effects are
purely cosmetic. They don’t provide any information that wasn’t already in the
two-dimensional graph in Figure 5.3. As this example shows, many “three-dimensional”
graphics really only make two-dimensional data look a little fancier. In
contrast, each of the three axes in Figure 5.20 carries distinct information,
making it a true three-dimensional graph. Researchers studying migration
patterns of a bird species (the *Bobolink*) counted the number of birds
flying over seven New York cities throughout the night. As shown on the inset
map, the cities were aligned eastwest so that the
researchers would learn what parts of the state the birds flew over, and at what
times of night, as they headed south for the winter. Thus, the three axes
measure *number of birds, time of night, *and *east-west location.*

**References:**

1. David Machin. Medical statistics: a textbook for the health sciences / David Machin, Michael J. Campbell, Stephen J Walters. – John Wiley & Sons, Ltd., 2007. – 346 p.

2. Nathan Tintle. Introduction to statistical investigations / Nathan Tintle, Beth Chance, George Cobb, Allan Rossman, Soma Roy, Todd Swanson, Jill VanderStoep. – UCSD BIEB100, Winter 2013. – 540 p.

3. Armitage P. Statistical Methods in Medical Research / P. Armitage, G. Berry, J. Matthews. – Blaskwell Science, 2002. – 826 p.

4. Larry Winner. Introduction to Biostatistics / Larry Winner. – Department of Statistics University of Florida, July 8, 2004. – 204 p.

5. Weiss N. A. (Neil A.) Elementary statistics / Neil A. Weiss; biographies by Carol A. Weiss. – 8th ed., 2012. – 774 p.