## KEY CONCEPTS

KEY CONCEPTS

• Correlation and regression are statistical methods to examine the linear relationship between two numerical variables measured on the same subjects. Correlation describes a relationship, and regression describes both a relationship and predicts an outcome.

• Correlation coefficients range from –1 to +1, both indicating a perfect relationship between two variables. A correlation equal to 0 indicates no relationship.

• Scatterplots provide a visual display of the relationship between two numerical variables and are recommended to check for a linear relationship and extreme values.

• The coefficient of determination, or r2, is simply the squared correlation; it is the preferred statistic to describe the strength between two numerical variables.

• The t test can be used to test the hypothesis that the population correlation is zero.

• The Fisher z transformation is used to form confidence intervals for the correlation or to test any hypotheses about the value of the correlation.

• The Fisher z transformation can also be used to form confidence intervals for the difference between correlations in two independent groups.

• It is possible to test whether the correlation between one variable and a second is the same as the correlation between a third variable and a second variable.

• When one or both of the variables in correlation is skewed, the Spearman rho nonparametric correlation is advised.

• Linear regression is called linear because it measures only straight-line relationships.

• The least squares method is the one used in almost all regression examples in medicine. With one independent and one dependent variable, the regression equation can be given as a straight line.

• The standard error of the estimate is a statistic that can be used to test hypotheses or form confidence intervals about both the intercept and the regression coefficient (slope).

• One important use of regression is to be able to predict outcomes in a future group of subjects.

• When predicting outcomes, the confidence limits are called confidence bands about the regression line. The most accurate predictions are for outcomes close to the mean of the independent variable X, and they become less precise as the outcome departs from the mean.

• It is possible to test whether the regression line is the same (i.e., has the same slope and intercept) in two different groups.

• A residual is the difference between the actual and the predicted outcome; looking at the distribution of residuals helps statisticians decide if the linear regression model is the best approach to analyzing the data.

• Regression toward the mean can result in a treatment or procedure appearing to be of value when it has had no actual effect; having a control group helps to guard against this problem.

• Correlation and regression should not be used unless observations are independent; it is not appropriate to include multiple measurements of the same subjects.

• Mixing two populations can also cause the correlation and regression coefficient to be larger than they should.

• The use of correlation versus regression should be dictated by the purpose ...

### Pop-up div Successfully Displayed

This div only appears when the trigger link is hovered over. Otherwise it is hidden from view.