# Statistical Tests

1. Comparison of means
1. â€‹To compare the means between groups t-test is used. (AIIMS Mayâ€™08).
2. When there is only one sample group and we take the means before and after interventions, it is called a matched or a paired design. This design is analyzed by using the paired t test (or, the matched-group t test).
3. When observations come from two separate or independent groups, the appropriate test is the two-sample independent â€“groups t test.
4. Similarly Z-test can be used instead of a T-test when the size of the sample is>100 and the population standard deviation is known.
5. 95% Confidence Intervals (CI) are estimated as: Observed mean + 1.96 Standard Error of Mean (SEM).
6. To test for variance (standard deviation) between two independent groups, F-test is used.
7. For comparison of three or more means the recommended approach is to use the analysis of variance or Anova.
8. The above-mentioned tests assume an underlying gaussian distribution of the means, thus they are also called parametric tests.
9. When the means have non-gaussian distribution then we use non-parametric tests, which include Wilcoxon Signed-Ranks tests in a paired design, and for two independent groups we use the Wilcoxon Rank-Sum Test (or Mann Whitney U). Both tests compare the equality of medians rather than means.
2. â€‹Comparison of proportions
1. Proportions are compared when the data is measured on a nominal or ordinal scale.
2. The test, which is commonly used, is the chi square test. It is a non-parametric test.
3. The test is done by constructing 2x2 contingency table and calculating the expected frequency of the variable from the observed frequency.
4. The test can be applied in two or more independent groups and a paired group.
5. For comparing proportions in paired-groups we use the McNemar test.
6. When the expected frequencies are less than 5 (small sample size), the appropriate test is called Fisherâ€™s exact test.
7. 95% CI is given by: Observed proportion + 1.96 SE of proportion.
3. â€‹Correlation & regression
1. â€‹Correlation
1. It describes the relationship between two numerical variables on a scatter plot.
2. Correlation coefficient (r) is the measure of correlation. It varies from â€“1 to +1. These two values describe a perfect negative and positive linear relationship between two variables. r = 0 implies that thereâ€™s no linear relationship.
3. Squaring the correlation coefficient (r2) gives us the coefficient of determination. It tells us the percentage of variability in one variable, which can be accounted for by the other variable.
4. For two ordinal (or one ordinal and one numerical) variables the Spearman rank correlation is used. It is also used when the numerical variables have a skewed distribution.
2. â€‹Regression
1. â€‹Regression analysis is used to predict the value of one variable (dependent variable) from the knowledge of the other (independent variable).
2. The regression equation is: Y = a + b X, where â€˜Yâ€™ = dependent variable, â€˜Xâ€™ = independent variable, â€˜aâ€™ = y-axis intercept and â€˜bâ€™ = regression coefficient.
4. â€‹â€‹Z Scores
The location of any element in a normal distribution can be expressed in terms of how many standard deviations it lies above or below the mean of the distribution. This is the z-score of the element. If the element lies above the mean it will have a positive z score and vice versa.
Z = X - Î¼
Ïƒ
â€‹