Sample standard deviation. Variation indicators

An approximate method for assessing the variability of a variation series is to determine the limit and amplitude, but the values ​​of the variant within the series are not taken into account. The main generally accepted measure of the variability of a quantitative characteristic within a variation series is standard deviation (σ - sigma). The larger the standard deviation, the higher the degree of fluctuation of this series.

The method for calculating the standard deviation includes the following steps:

1. Find the arithmetic mean (M).

2. Determine the deviations of individual options from the arithmetic mean (d=V-M). In medical statistics, deviations from the average are designated as d (deviate). The sum of all deviations is zero.

3. Square each deviation d 2.

4. Multiply the squares of the deviations by the corresponding frequencies d 2 *p.

5. Find the sum of the products å(d 2 *p)

6. Calculate the standard deviation using the formula:

When n is greater than 30, or when n is less than or equal to 30, where n is the number of all options.

Standard deviation value:

1. The standard deviation characterizes the spread of the variant relative to the average value (i.e., the variability of the variation series). The greater the sigma, the higher the degree of diversity of this series.

2. The standard deviation is used for a comparative assessment of the degree of correspondence of the arithmetic mean to the variation series for which it was calculated.

Variations of mass phenomena obey the law of normal distribution. The curve representing this distribution looks like a smooth bell-shaped symmetrical curve (Gaussian curve). According to the theory of probability, in phenomena that obey the law of normal distribution, there is a strict mathematical relationship between the values ​​of the arithmetic mean and the standard deviation. The theoretical distribution of a variant in a homogeneous variation series obeys the three-sigma rule.

If in a system of rectangular coordinates the values ​​of a quantitative characteristic (variants) are plotted on the abscissa axis, and the frequency of occurrence of a variant in a variation series is plotted on the ordinate axis, then variants with larger and smaller values ​​are evenly located on the sides of the arithmetic mean.



It has been established that with a normal distribution of the trait:

68.3% of the variant values ​​are within M±1s

95.5% of the variant values ​​are within M±2s

99.7% of the variant values ​​are within M±3s

3. The standard deviation allows you to establish normal values ​​for clinical and biological parameters. In medicine, the interval M±1s is usually taken as the normal range for the phenomenon being studied. The deviation of the estimated value from the arithmetic mean by more than 1s indicates a deviation of the studied parameter from the norm.

4. In medicine, the three-sigma rule is used in pediatrics for individual assessment of the level of physical development of children (sigma deviation method), for the development of standards for children's clothing

5. The standard deviation is necessary to characterize the degree of diversity of the characteristic being studied and to calculate the error of the arithmetic mean.

The value of the standard deviation is usually used to compare the variability of series of the same type. If two series with different characteristics are compared (height and weight, average duration of hospital treatment and hospital mortality, etc.), then a direct comparison of sigma sizes is impossible , because standard deviation is a named value expressed in absolute numbers. In these cases, use coefficient of variation (Cv), which is a relative value: the percentage ratio of the standard deviation to the arithmetic mean.

The coefficient of variation is calculated using the formula:

The higher the coefficient of variation , the greater the variability of this series. It is believed that a coefficient of variation of more than 30% indicates the qualitative heterogeneity of the population.

Standard deviation(synonyms: standard deviation, standard deviation, square deviation; related terms: standard deviation, standard spread) - in probability theory and statistics, the most common indicator of the dispersion of the values ​​of a random variable relative to its mathematical expectation. With limited arrays of samples of values, instead of the mathematical expectation, the arithmetic mean of the set of samples is used.

Encyclopedic YouTube

  • 1 / 5

    The standard deviation is measured in units of measurement of the random variable itself and is used when calculating the standard error of the arithmetic mean, when constructing confidence intervals, when statistically testing hypotheses, when measuring the linear relationship between random variables. Defined as the square root of the variance of a random variable.

    Standard deviation:

    s = n n − 1 σ 2 = 1 n − 1 ∑ i = 1 n (x i − x ¯) 2 ;
    • (\displaystyle s=(\sqrt ((\frac (n)(n-1))\sigma ^(2)))=(\sqrt ((\frac (1)(n-1))\sum _( i=1)^(n)\left(x_(i)-(\bar (x))\right)^(2)));)

    Note: Very often there are discrepancies in the names of MSD (Root Mean Square Deviation) and STD (Standard Deviation) with their formulas. For example, in the numPy module of the Python programming language, the std() function is described as "standard deviation", while the formula reflects the standard deviation (division by the root of the sample). In Excel, the STANDARDEVAL() function is different (division by the root of n-1). Standard deviation (estimate of the standard deviation of a random variable x relative to its mathematical expectation based on an unbiased estimate of its variance):

    s (\displaystyle s)

    σ = 1 n ∑ i = 1 n (x i − x ¯) 2 . (\displaystyle \sigma =(\sqrt ((\frac (1)(n))\sum _(i=1)^(n)\left(x_(i)-(\bar (x))\right) ^(2))).) Where σ 2 (\displaystyle \sigma ^(2)) - - dispersion; x i (\displaystyle x_(i)) i th element of the selection;

    n (\displaystyle n)

    - sample size;

    - arithmetic mean of the sample:

    x ¯ = 1 n ∑ i = 1 n x i = 1 n (x 1 + … + x n) .

    x ¯ = 1 n ∑ i = 1 n x i = 1 n (x 1 + … + x n) . ((\displaystyle (\bar (x))=(\frac (1)(n))\sum _(i=1)^(n)x_(i)=(\frac (1)(n))(x_ (1)+\ldots +x_(n)).) It should be noted that both estimates are biased. In the general case, it is impossible to construct an unbiased estimate. However, the estimate based on the unbiased variance estimate is consistent. In accordance with GOST R 8.736-2011, the standard deviation is calculated using the second formula of this section. Please check the results. Three sigma rule 3 σ (\displaystyle 3\sigma ) true, and not obtained as a result of sample processing).

    If the true value 3 σ (\displaystyle 3\sigma ) is unknown, then you should not use σ (\displaystyle \sigma ), A s. Thus, rule of three sigma is converted to the rule of three s .

    Interpretation of the standard deviation value

    A larger standard deviation value shows a greater spread of values ​​in the presented set with the average value of the set; a smaller value, accordingly, shows that the values ​​in the set are grouped around the average value.

    For example, we have three number sets: (0, 0, 14, 14), (0, 6, 8, 14) and (6, 6, 8, 8). All three sets have mean values ​​equal to 7, and standard deviations, respectively, equal to 7, 5 and 1. The last set has a small standard deviation, since the values ​​in the set are grouped around the mean value; the first set has the most great importance standard deviation - values ​​within the set diverge greatly from the average value.

    In a general sense, standard deviation can be considered a measure of uncertainty. For example, in physics, standard deviation is used to determine the error of a series of successive measurements of some quantity. This value is very important for determining the plausibility of the phenomenon under study in comparison with the value predicted by the theory: if the average value of the measurements differs greatly from the values ​​​​predicted by the theory (large standard deviation), then the obtained values ​​or the method of obtaining them should be rechecked. identified with portfolio risk.

    Climate

    Suppose there are two cities with the same average maximum daily temperature, but one is located on the coast and the other on the plain. It is known that cities located on the coast have many different maximum daytime temperatures that are lower than cities located inland. Therefore, the standard deviation of the maximum daily temperatures for a coastal city will be less than for the second city, despite the fact that the average value of this value is the same, which in practice means that the probability that the maximum air temperature on any given day of the year will be higher differ from the average value, higher for a city located inland.

    Sport

    Let's assume that there are several football teams that are evaluated according to some set of parameters, for example, the number of goals scored and conceded, scoring chances, etc. It is most likely that the best team in this group will have best values according to more parameters. The smaller the team’s standard deviation for each of the presented parameters, the more predictable the team’s result is; such teams are balanced. On the other hand, a team with a large standard deviation is difficult to predict the result, which in turn is explained by an imbalance, for example, a strong defense but a weak attack.

    Using the standard deviation of team parameters makes it possible, to one degree or another, to predict the result of a match between two teams, assessing the strengths and weaknesses of the teams, and therefore the chosen methods of fighting.

  • 6. Plan of statistical research, its content. 7. Statistical research program, its content.
  • 8. Statistical population, its group properties, types. Requirements for the sample population.
  • 25. Statistical tables, their types and requirements for them.
  • 9. Collection of statistical material.
  • 10. Basic operations for developing statistical material.
  • 11. Analysis of the results of statistical research.
  • 12. Implementation of statistical research results into practice
  • 13. Absolute values, their application in healthcare.
  • 14. Relative values, their application in activity analysis
  • 15. Variation series, their types, meaning. 16. Values ​​characterizing the variation series.
  • 17. Methods for calculating average values, meaning.
  • 18. Standard deviation, calculation method, value.
  • 19. Error of representativeness of average values, calculation method, value. 20. Error of representativeness of relative values, calculation method, meaning.
  • 21. Estimation of the reliability of the difference in statistical values.
  • 23. The concept of correlation analysis.
  • 24. Graphic images of the results of statistical research, types.
  • 26. Time series, indicators, calculation and application in medicine.
  • 27. Public health of the population, indicators, significance. 28. Factors influencing public health. Health formula.
  • 29. Sections of demography, its importance for healthcare.
  • 30. Population statistics, indicators, their significance. 31. Age structure of the population, types, social significance.
  • 33. Population dynamics, types, indicators, medical and social significance.
  • 34. Natural movement of the population, indicators, patterns, medical and social significance.
  • 35. Fertility, levels, calculation methods, analysis and medical and social aspects of its regulation.
  • 36. Mortality rate, indicators, levels, calculation methods, analysis and medical and social significance.
  • 37. Infant mortality, causes, age characteristics, calculation methods.
  • 38. Perinatal mortality, calculation methods, levels, structure, causes, medical and social significance.
  • 40. Population reproduction, types, indicators, calculation methods.
  • 42. Incidence, indicators, structure, methods of study.
  • 43. International statistical classification of diseases and health-related problems, meaning, principles of construction.
  • 3) Diseases in hospitalized patients
  • 4) Diseases with temporary disability (see Question 58).
  • 45. Morbidity with temporary disability, causes, indicators. 46. ​​Study of morbidity with temporary disability. Police registration of morbidity.
  • 47. Preventive medical examinations, types, procedure, documents.
  • 48. Study of morbidity by seeking medical help.
  • 51. Physical development, study methods, medical and social significance.
  • 52. Disability of the population, causes, indicators, medical and social significance. 102. Disability, procedure for establishing and registration documents.
  • 54. Diseases of the circulatory system, their medical and social significance and conditionality. Organization of cardiological service. Primary prevention.
  • 55. Neoplasms, their medical and social significance and conditionality. Organization of oncology service. Primary prevention.
  • 59. Mental disorders, their medical and social significance and conditionality. Organization of psychoneurological care. Primary prevention.
  • 60. Alcoholism and drug addiction, their medical and social significance and conditionality. Organization of drug treatment. Primary prevention.
  • 61. Principles of state policy of the Republic of Belarus in the field of healthcare.
  • 62. Types, forms, conditions of medical care.
  • 63. Primary health care, principles, organizational structure, significance, development prospects.
  • 65. Registry, its functions. Forms for making an appointment with a doctor.
  • 68. General practitioner, functions, content of work, features of VTE.
  • 76. Reception department, tasks, organizational structure.
  • 80. Hospital-replacing technologies, types, operating principles, significance
  • 103. Medical and rehabilitation expert commission, its composition and functions.
  • 104. Medical, social and labor rehabilitation of disabled people.
  • Stage II – territorial medical association (TMO).
  • Stage III – regional hospital and regional medical institutions.
  • 109. Prevention is the most important principle of healthcare, its forms and levels.
  • 113. Healthy lifestyle, its components, medical and social significance. 114. Formation of a healthy lifestyle, directions.
  • 115. Methods and means of hygienic education and training of the population. 116. Characteristics of methods of hygienic education, advantages and disadvantages.
  • 117. Protection of motherhood and childhood, its social significance, government measures in the Republic of Belarus.
  • 122. Children's hospital, features of hospitalization, structures and organization of work. 123. Analysis of the activities of a children's hospital.
  • 124. Women's consultation, its structure, tasks and organization of work. 125. Basic medical documentation and performance indicators of the antenatal clinic.
  • 126. Maternity hospital, structure, organization of reception of pregnant women, women in labor and postpartum women. 127. Basic medical documentation and performance indicators of the maternity hospital.
  • 18. Standard deviation, calculation method, value.

    An approximate method for assessing the variability of a variation series is to determine the limit and amplitude, but the values ​​of the variant within the series are not taken into account. The main generally accepted measure of the variability of a quantitative characteristic within a variation series is standard deviation (σ - sigma). The larger the standard deviation, the higher the degree of fluctuation of this series.

    The method for calculating the standard deviation includes the following steps:

    1. Find the arithmetic mean (M).

    2. Determine the deviations of individual options from the arithmetic mean (d=V-M). In medical statistics, deviations from the average are designated as d (deviate). The sum of all deviations is zero.

    3. Square each deviation d 2.

    4. Multiply the squares of the deviations by the corresponding frequencies d 2 *p.

    5. Find the sum of the products (d 2 *p)

    6. Calculate the standard deviation using the formula:

    when n is greater than 30, or when n is less than or equal to 30, where n is the number of all options.

    Standard deviation value:

    1. The standard deviation characterizes the spread of the variant relative to the average value (i.e., the variability of the variation series). The greater the sigma, the higher the degree of diversity of this series.

    2. The standard deviation is used for a comparative assessment of the degree of correspondence of the arithmetic mean to the variation series for which it was calculated.

    Variations of mass phenomena obey the law of normal distribution. The curve representing this distribution looks like a smooth bell-shaped symmetrical curve (Gaussian curve). According to the theory of probability, in phenomena that obey the law of normal distribution, there is a strict mathematical relationship between the values ​​of the arithmetic mean and the standard deviation. The theoretical distribution of a variant in a homogeneous variation series obeys the three-sigma rule.

    If in a system of rectangular coordinates the values ​​of a quantitative characteristic (variants) are plotted on the abscissa axis, and the frequency of occurrence of a variant in a variation series is plotted on the ordinate axis, then variants with larger and smaller values ​​are evenly located on the sides of the arithmetic mean.

    It has been established that with a normal distribution of the trait:

    68.3% of the values ​​of the option are within M1

    95.5% of the values ​​of the option are within M2

    99.7% of the values ​​of the option are within M3

    3. The standard deviation allows you to establish normal values ​​for clinical and biological parameters. In medicine, the interval M1 is usually taken as the normal range for the phenomenon being studied. The deviation of the estimated value from the arithmetic mean by more than 1 indicates a deviation of the studied parameter from the norm.

    4. In medicine, the three-sigma rule is used in pediatrics for individual assessment of the level of physical development of children (sigma deviation method), for the development of standards for children's clothing

    5. The standard deviation is necessary to characterize the degree of diversity of the characteristic being studied and to calculate the error of the arithmetic mean.

    The value of the standard deviation is usually used to compare the variability of series of the same type. If two series with different characteristics are compared (height and weight, average duration of hospital treatment and hospital mortality, etc.), then a direct comparison of sigma sizes is impossible , because standard deviation is a named value expressed in absolute numbers. In these cases, use coefficient of variation (Cv), which is a relative value: the percentage ratio of the standard deviation to the arithmetic mean.

    The coefficient of variation is calculated using the formula:

    The higher the coefficient of variation , the greater the variability of this series. It is believed that a coefficient of variation of more than 30% indicates the qualitative heterogeneity of the population.

    "

    The main criteria for the diversity of a characteristic in a statistical population are: limit, amplitude, standard deviation, coefficient of oscillation and coefficient of variation. In the previous lesson, it was discussed that average values ​​provide only a generalized characteristic of the characteristic being studied in the aggregate and do not take into account the values ​​of its individual variants: minimum and maximum values, above average, below average, etc.

    Example. Average values ​​of two different number sequences: -100; -20; 100; 20 and 0.1; -0.2; 0.1 are absolutely identical and equalABOUT.However, the scatter ranges of these relative mean sequence data are very different.

    The determination of the listed criteria for the diversity of a characteristic is primarily carried out taking into account its value in individual elements of the statistical population.

    Indicators for measuring variation of a trait are absolute And relative. Absolute indicators of variation include: range of variation, limit, standard deviation, dispersion. The coefficient of variation and the coefficient of oscillation refer to relative measures of variation.

    Limit (lim)– This is a criterion that is determined by the extreme values ​​of a variant in a variation series. In other words, this criterion is limited by the minimum and maximum values ​​of the attribute:

    Amplitude (Am) or range of variation – This is the difference between the extreme options. The calculation of this criterion is carried out by subtracting its minimum value from the maximum value of the attribute, which allows us to estimate the degree of scatter of the option:

    The disadvantage of limit and amplitude as criteria of variability is that they completely depend on the extreme values ​​of the characteristic in the variation series. In this case, fluctuations in attribute values ​​within a series are not taken into account.

    The most complete description of the diversity of a trait in a statistical population is provided by standard deviation(sigma), which is a general measure of the deviation of an option from its average value. Standard deviation is often called standard deviation.

    The standard deviation is based on a comparison of each option with the arithmetic mean of a given population. Since in the aggregate there will always be options both less and more than it, the sum of deviations with the sign "" will be canceled out by the sum of deviations with the sign "", i.e. the sum of all deviations is zero. In order to avoid the influence of the signs of the differences, deviations from the arithmetic mean squared are taken, i.e. . The sum of squared deviations does not equal zero. To obtain a coefficient that can measure variability, take the average of the sum of squares - this value is called variances:

    In essence, dispersion is the average square of deviations of individual values ​​of a characteristic from its average value. Dispersion square of the standard deviation.

    Variance is a dimensional quantity (named). So, if the variants of a number series are expressed in meters, then the variance gives square meters; if the options are expressed in kilograms, then the variance gives the square of this measure (kg 2), etc.

    Standard deviation– square root of variance:

    In the event that the number of elements of the population, then when calculating the dispersion and standard deviation in the denominator of the fraction, instead ofmust be placed.

    The calculation of the standard deviation can be divided into six stages, which must be carried out in a certain sequence:

    Application of standard deviation:

    a) for judging the variability of variation series and comparative assessment of the typicality (representativeness) of arithmetic averages. This is necessary in differential diagnosis when determining the stability of symptoms.

    b) to reconstruct the variation series, i.e. restoration of its frequency response based on three sigma rules. In the interval (М±3σ) 99.7% of all variants of the series are located in the interval (М±2σ) - 95.5% and in the range (М±1σ) - 68.3% row option(Fig. 1).

    c) to identify “pop-up” options

    d) to determine the parameters of norm and pathology using sigma estimates

    e) to calculate the coefficient of variation

    f) to calculate the average error of the arithmetic mean.

    To characterize any population that hasnormal distribution type , it is enough to know two parameters: the arithmetic mean and the standard deviation.

    Figure 1. Three Sigma rule

    Example.

    In pediatrics, standard deviation is used to assess the physical development of children by comparing the data of a particular child with the corresponding standard indicators. The arithmetic average of the physical development of healthy children is taken as the standard. Comparison of indicators with standards is carried out using special tables in which the standards are given along with their corresponding sigma scales. It is believed that if the child’s physical development indicator is within the standard (arithmetic mean) ±σ, then physical development the child (according to this indicator) corresponds to the norm. If the indicator is within the standard ±2σ, then there is a slight deviation from the norm. If the indicator goes beyond these limits, then the child’s physical development differs sharply from the norm (pathology is possible).

    In addition to indicators of variation expressed in absolute values, statistical research uses indicators of variation expressed in relative values. Oscillation coefficient - this is the ratio of the range of variation to the average value of the trait. The coefficient of variation - This is the ratio of the standard deviation to the average value of the characteristic. Typically, these values ​​are expressed as percentages.

    Formulas for calculating relative variation indicators:

    From the above formulas it is clear that the greater the coefficient V is closer to zero, the smaller the variation of the characteristic values. The more V, the more variable the sign.

    In statistical practice, the coefficient of variation is most often used. It is used not only for a comparative assessment of variation, but also to characterize the homogeneity of the population. The population is considered homogeneous if the coefficient of variation does not exceed 33% (for distributions close to normal). Arithmetically, the ratio of σ and the arithmetic mean neutralizes the influence of the absolute value of these characteristics, and the percentage ratio makes the coefficient of variation a dimensionless (unnamed) value.

    The resulting value of the coefficient of variation is estimated in accordance with the approximate gradations of the degree of diversity of the trait:

    Weak - up to 10%

    Average - 10 - 20%

    Strong - more than 20%

    The use of the coefficient of variation is advisable in cases where it is necessary to compare characteristics that are different in size and dimension.

    The difference between the coefficient of variation and other scatter criteria is clearly demonstrated example.

    Table 1

    Composition of industrial enterprise workers

    Based on the statistical characteristics given in the example, we can draw a conclusion about the relative homogeneity of the age composition and educational level of the enterprise’s employees, given the low professional stability of the surveyed contingent. It is easy to see that an attempt to judge these social trends by the standard deviation would lead to an erroneous conclusion, and an attempt to compare the accounting characteristics “work experience” and “age” with the accounting indicator “education” would generally be incorrect due to the heterogeneity of these characteristics.

    Variation— these are differences in individual values ​​of a characteristic among units of the population being studied. The study of variation has a large practical significance and is a necessary link in economic analysis. The need to study variation is due to the fact that the average, being the resultant, performs its main task with varying degrees of accuracy: the smaller the differences in the individual values ​​of the attribute that are subject to averaging, the more homogeneous the set, and, therefore, the more accurate and reliable the average, and vice versa. Therefore, by the degree of variation one can judge the limits of variation of a characteristic, the homogeneity of the population for a given characteristic, the typicality of the average, the relationship of factors that determine the variation.

    Changing the variation of a characteristic in the aggregate is carried out using absolute and relative indicators.

    Absolute measures of variation include:

    Range of variation (R)

    Range of variation is the difference between the maximum and minimum values ​​of the attribute

    It shows the limits within which the value of the characteristic in the studied variable changes.

    Example. The work experience of the five applicants in previous work is: 2,3,4,7 and 9 years.
    Solution: range of variation = 9 - 2 = 7 years.

    For a generalized description of differences in attribute values, average variation indicators are calculated based on taking into account deviations from the arithmetic mean. The difference is taken as a deviation from the average.

    In this case, in order to avoid the sum of deviations of variants of a characteristic from the average turning to zero (zero property of the average), one must either ignore the signs of the deviation, that is, take this sum modulo , or square the deviation values

    Average linear and square deviation

    Average linear deviation- this is from the absolute deviations of individual values ​​of a characteristic from the average.

    The average linear deviation is simple:

    The work experience of the five applicants in previous work is: 2,3,4,7 and 9 years.

    In our example: years;

    Answer: 2.4 years.

    Average linear deviation weighted applies to grouped data:

    Due to its convention, the average linear deviation is used in practice relatively rarely (in particular, to characterize the fulfillment of contractual obligations regarding uniformity of delivery; in the analysis of product quality, taking into account the technological features of production).

    Standard deviation

    The most perfect characteristic of variation is the mean square deviation, which is called the standard (or standard deviation).

    () is equal to the square root of the average square deviation of individual values ​​of the characteristic from:

    The standard deviation is simple:

    Weighted standard deviation is applied to grouped data:

    Between the root mean square and mean linear deviations under normal distribution conditions the following ratio takes place: ~ 1.25.

    The standard deviation, being the main absolute measure of variation, is used in determining the ordinate values ​​of a normal distribution curve, in calculations related to the organization of sample observation and establishing the accuracy of sample characteristics, as well as in assessing the limits of variation of a characteristic in a homogeneous population.

    Dispersion Dispersion

    - represents the average square of deviations of individual values ​​of a characteristic from their average value.

    The variance is simple:

    In our example:

    Weighted variance:

    It is more convenient to calculate the variance using the formula:

    which is obtained from the main one through simple transformations. In this case, the average square of deviations is equal to the average of the squares of the attribute values ​​minus the square of the average.

    For ungrouped data:

    For grouped data: Alternative trait variation

    ,

    consists in the presence or absence of the property being studied in units of the population. Quantitatively, the variation of an alternative attribute is expressed by two values: the presence of a unit of the studied property is denoted by one (1), and its absence is denoted by zero (0). The proportion of units possessing the characteristic being studied is denoted by the letter , and the proportion of units not possessing this characteristic is denoted by . Considering that p + q = 1 (hence q = 1 - p), and the average value of the alternative characteristic is equal to

    mean square deviation

    The maximum value of the average square deviation (dispersion) takes in the case of equality of shares, i.e. when i.e. . The lower limit of this indicator is zero, which corresponds to a situation in which there is no variation in the aggregate. Standard deviation of the alternative characteristic:

    So, if in a manufactured batch 3% of products turned out to be non-standard, then the dispersion of the share of non-standard products is , and the standard deviation or 17.1%.

    Standard deviation is equal to the square root of the average square deviation of individual values ​​of the attribute from the arithmetic mean.

    Relative Variation Measures

    Relative measures of variation include:

    Comparing the variation of several populations for the same characteristic, and even more so for different characteristics, using absolute indicators is not possible. In these cases, for a comparative assessment of the degree of difference, relative indicators of variation are constructed. They are calculated as the ratio of absolute variations to the average:

    Other relative characteristics are also calculated. For example, to assess variation in the case of a skewed distribution, calculate the ratio of the average linear deviation to the median

    since, thanks to the property of the median, the sum of absolute deviations of a characteristic from its value is always less than from any other.

    As a relative measure of dispersion that evaluates the variation in the central part of the population, the relative quartile deviation is calculated, where is the average quartile of the half-sum of the difference between the third (or upper) quartile () and the first (or lower) quartile ().

    In practice, the coefficient of variation is most often calculated. The lower limit of this indicator is zero, it has no upper limit, but it is known that as the variation of a characteristic increases, its value also increases. The coefficient of variation is, in a certain sense, a criterion for the homogeneity of the population (in the case of normal distribution).

    Let's calculate the coefficient of variation based on the standard deviation for the following example. The consumption of raw materials per unit of production was (kg): according to one technology at , and according to the other at. A direct comparison of the value of standard deviations could lead to the misconception that the variation in raw material consumption by the first technology is more intense than by the second (. The relative measure of variation ( allows us to draw the opposite conclusion

    Example of calculation of variation indices

    At the stage of selecting candidates to participate in the implementation of a complex project, the company announced a competition for professionals. The distribution of applicants by work experience showed the following results:

    Let's calculate the average production experience, years

    Let's calculate the variance by length of work experience

    The same result is obtained if you use a different formula for calculating variance for the calculation

    Let's calculate the standard deviation, years:

    Let's determine the coefficient of variation, %:

    Variance addition rule

    To assess the influence of factors that determine variation, a grouping technique is used: the population is divided into groups, choosing one of the determining factors as a grouping characteristic. Then, along with the total variance calculated for the entire population, the within-group variance (or the average of the group) and the between-group variance (or the variance of the group means) are calculated.

    Total variance characterizes the variation of a trait in its entirety, formed under the influence of all factors and conditions.

    Intergroup variance measures the systematic variation due to the influence of the factor by which the grouping is made:

    Within-group variance evaluates the variation of a trait that has developed under the influence of other factors not taken into account in this study and is independent of the grouping factor. It is defined as the average of the group variances.

    All three variances () are related to each other by the following equality, which is known as rule for adding variances:

    On this ratio, indicators are built that evaluate the influence of a grouping characteristic on the formation of general variation. These include the empirical coefficient of determination () and the empirical correlation ratio ()

    () characterizes the share of intergroup variance in the total variance:

    and shows how much the variation of a trait in the aggregate is due to the grouping factor.

    Empirical correlation relationship(!!\eta = \sqrt( \frac(\delta^2)(\sigma^2) )

    evaluates the closeness of the connection between the studied and grouping characteristics. The limit values ​​are zero and one. The closer to one, the closer the connection.

    Example. The cost of 1 sq.m of total area (conventional units) on the housing market for ten 17th houses with improved layout was:

    It is known that the first five houses were built near the business center, and the rest were built at a considerable distance from it.

    To calculate the total variance, let's calculate the average cost of 1 sq.m. total area: The total dispersion is determined by the formula :

    Let's calculate the average cost of 1 sq.m. and the dispersion for this indicator for each group of houses that differ in location relative to the city center:

    A) for houses built near the center:

    b) for houses built far from the center:

    Variation in the cost of 1 sq.m. total area caused by a change in the location of houses is determined the magnitude of intergroup variance:

    Variation in the cost of 1 sq.m. total area, due to changes in other indicators that we do not take into account, is measured the value of within-group variance

    The found variances add up to the total variance

    Empirical coefficient of determination:

    shows that the dispersion of the cost of 1.sq.m. of the total area in the housing market is 81.8% explained by differences in the location of new buildings in relation to the business center and 18.2% by other factors.

    The empirical correlation relationship indicates a significant impact on the cost of housing by the location of houses.

    The rule for adding variances for a share the sign is written as follows:

    and three types of proportion variances for grouped data are determined by the following formulas:

    total variance:

    Formulas for intergroup and intragroup variances:

    Characteristics of the distribution shape

    To get an idea of ​​the shape of the distribution, indicators of the average level (,), indicators of variation, asymmetry and kurtosis are used.

    In symmetric distributions, the arithmetic mean, mode and median coincide (. If this equality is violated, the distribution is asymmetric.

    The simplest indicator of asymmetry is the difference, which is positive in the case of right-sided asymmetry, and negative in the case of left-sided asymmetry.

    Asymmetrical distribution

    To compare the asymmetry of several rows, a relative indicator is calculated

    Variations are used as generalizing characteristics central moments of distribution th order, corresponding to the power to which the deviations of individual values ​​of a characteristic from the arithmetic mean are raised:

    For ungrouped data:

    For ungrouped data:

    The first-order moment, according to the property of the arithmetic mean, is equal to zero.

    The second order moment is the dispersion.

    Moments of the third and fourth orders are used to construct indicators that evaluate the features of the shape of empirical distributions.

    The third-order moment measures the degree of skewness or asymmetricity of the distribution.

    — asymmetry coefficient

    In symmetric distributions, like all central moments of odd order. The inequality of the third order central moment to zero indicates the asymmetry of the distribution. Moreover, if , then the asymmetry is right-sided and the right branch is elongated relative to the maximum ordinate; if , then the asymmetry is left-sided (on the graph this corresponds to the elongation of the left branch).

    To characterize the peakedness or flatness of the distribution, the ratio of the fourth order moment () to the standard deviation to the fourth power () is calculated. For a normal distribution, therefore, kurtosis is found using the formula:

    For a normal distribution it vanishes. For peaked distributions, for flat-topped ones.

    Kurtosis of distribution

    In addition to the indicators discussed above, a general characteristic of variation in a homogeneous population is a certain order in the change in distribution frequencies in accordance with changes in the value of the characteristic being studied, called distribution pattern.

    The nature (type) of the distribution pattern can be revealed by constructing a variation series based on a large volume of observations, as well as by choosing the number of groups and the value of the integrals in which the pattern could most clearly appear.

    Analysis of variation series involves identifying the nature of the distribution (as a result of the action of the variation mechanism), establishing the distribution function, and checking the compliance of the empirical distribution with the theoretical one.

    Empirical distribution, obtained from observational data, is graphically represented by an empirical distribution curve using a polygon.

    In practice, there are various types of distributions, among which we can distinguish symmetric and asymmetric, single-vertex and multi-vertex.

    Establishing the type of distribution means expressing the mechanism of pattern formation in analytical form. Many phenomena and their characteristics are characterized by characteristic distribution forms, which are approximated by the corresponding curves. With all the variety of distribution forms, the most widely used theoretical ones are normal distribution, Pausson distribution, binomial distribution, etc.

    A special place in the study of variation belongs to the normal law, due to its mathematical properties. For the normal law, the three-sigma rule is satisfied, according to which the variation of individual values ​​of a characteristic is within the range of the average value. At the same time, about 70% of all units are within the boundaries, and 95% are within the boundaries.

    The assessment of the correspondence between the empirical and theoretical distributions is carried out using goodness-of-fit criteria, among which the Pearson, Romanovsky, Yastremsky, and Kolmogorov criteria are widely known.



Did you like the article? Share with friends: