Parametric Tests in R : Guide to Statistical Analysis

Hello, I am Zubair Goraya, a Ph.D. scholar, a certified data analyst, and a freelancer with 5 years of experience. I will explain how to perform and report parametric tests with R, using examples of different parametric tests. But before we start, let me ask you a question: 

How do you determine if the data you are analyzing, whether parametric or nonparametric, is reliable and valid? 

How do you conduct hypothesis testing and conclude the statistical data analysis? 

How do you communicate your analysis findings-- derived from comparing two data sets using parametric or nonparametric tests—and make recommendations to your audience

Data analysts confront these questions daily and must use appropriate statistical tools and techniques to answer them. One of data analysts' most common and powerful tools is parametric tests. 

Parametric Tests in R : Your Guide to Powerful Statistical Analysis
Table of Contents

Key Points

  • Parametric tests, such as t-tests and ANOVA  heavily based on assumptions, including normality, homogeneity of variance, two independent variables, and random sampling. Ensuring these conditions are met is crucial for valid results.
  • Parametric tests, designed for interval and ratio data, excel in situations with precise measurement scales and significant intervals between values, facilitating detailed analysis. Nonparametric tests, i.e., tests that don't assume a specific distribution, are also used when these criteria aren't met.
  • When all necessary assumptions, i.e., normality and homoscedasticity, are met, parametric tests generally provide a higher statistical power than nonparametric tests. This means that both parametric and nonparametric statistics efficiently identify genuine effects, if any exist, enhancing the credibility of the results.
  • Parametric tests, such as those involving two or more variables (like regression analysis), often require larger sample sizes. A large sample size contributes significantly to the robustness and precision of both the parametric and nonparametric statistical inferences.
  • Parametric tests, widely employed in experimental research settings, and nonparametric tests find their niche in controlled studies where the assumptions tally with the data's nature. Parametric and nonparametric statistics are pivotal in psychology, biology, and medicine.

What is a Parametric Test?

Parametric tests are statistical tests that compare the means or proportions of different groups or samples and test whether they are significantly different from each other or have a specified value, read more. Parametric tests are based on some assumptions, such as:
  1. Normality,
  2. Homogeneity of variance,
  3. Independence,
  4. Random sampling
Parametric tests can help data analysts test hypotheses and draw conclusions about the data, such as whether the mean score of a class is equal to the average score of the population, whether the mean height of men is different from the mean height of women, whether the relationship between the weight and the height of individuals is significant.

Difference Between Parametric and Non-Parametric Tests?

Feature

Parametric Test

Non-Parametric Test

Assumption of Data Distribution

Assumes a specific distribution (usually normal)

No assumption about the underlying distribution

Data Type

Suited for interval and ratio data

Suited for ordinal, interval, and ratio data

Statistical Tests

t-test, ANOVA, regression analysis

Mann-Whitney U test, Wilcoxon signed-rank test

Robustness to Outliers

Sensitive to outliers

Less sensitive to outliers

Sample Size Requirement

Larger sample sizes required

Smaller sample sizes are often sufficient

Statistical Power

Generally higher

It may have lower statistical power

Application

Commonly used in experimental research

Useful when assumptions of parametric tests cannot be met or in non-experimental settings

What are the Assumptions of Parametric Tests?

Assumptions are the conditions the data must meet for the parametric test to be applicable and reliable. If the assumptions are violated, the parametric test may produce inaccurate or misleading results, such as false positives, negatives, or incorrect estimates. There are four common assumptions of parametric tests.

Normality

The data follows a normal distribution, or a bell-shaped curve, where most values are clustered around the mean, and the tails are symmetrical and thin; read more.

Homogeneity of variance

The data has equal or similar variances, or spreads, across different groups or levels of the independent variable. It means the data has consistent variability and does not have outliers or extreme values; read more.

Independence

The data is independent across different observations or samples or not influenced by each other. It means that the data is randomly collected and has no hidden or confounding factors.

Random sampling

The data is randomly sampled or selected by chance from the population of interest. It means the data is representative and has no bias or selection error.

These assumptions are important because they ensure that the parametric test is appropriate and valid for the data and that the results are generalizable and meaningful. Therefore, checking these assumptions before performing any parametric test and dealing with violations appropriately is essential.

What happens if assumptions are violated?

Assumption

Consequences of Violation

Actions to Take

Normality

Inaccurate p-values and confidence intervals due to sensitivity to outliers or skewed distributions.

Explore non-parametric tests, transform data for normality, or use robust statistical methods.

Homogeneity of Variance

Unreliable comparisons between groups affect result accuracy.

Use Welch's ANOVA and the Games-Howell post hoc test, or opt for non-parametric alternatives.

Independence

Biased estimates and underestimated standard errors may impact inferential conclusions.

Employ mixed-effects models, bootstrapping, or other techniques for non-independent data.

Random Sampling

Introduction of bias, limiting generalizability.

State limitations and consider alternative sampling methods for robust results.

Linearity (for regression)

Violation may affect regression model validity.

Check scatterplots, transform variables, or use non-linear regression models.

Homoscedasticity (for regression)

Heteroscedasticity undermines regression model assumptions.

Transform variables, use weighted least squares or robust regression methods.

Multicollinearity (for regression)

High correlation among predictors can lead to unstable coefficient estimates.

Identify and address multicollinearity through variable selection or regularization techniques.

Independence of residuals (for regression)

Non-independence may impact the reliability of regression results.

Check residuals using residual plots or employ time-series analysis for time-dependent data.

Validating Parametric Test Assumptions in R

There are two methods to check the assumptions of parametric tests in R, which are:

  1. Graphical methods
  2. Numerical methods

Graphical methods

These methods use plots or graphs to visually inspect the data and look for patterns or deviations that may indicate violations of assumptions. Some examples of graphical methods are: 

  1. Histograms
  2. Boxplots
  3. QQ-plots
  4. Scatterplots.

Numerical methods

These methods use statistics or tests to measure the data numerically and look for any values or results that may indicate violations of assumptions. Some examples of numerical methods: 

Both graphical and numerical methods have advantages and disadvantages and may complement or contradict each other. Therefore, using both ways to check the assumptions of parametric tests and to use your judgment and common sense to decide whether the assumptions are met is recommended.

Check Parametric Test Assumptions in R

Load the data set and the packages.

# Load the packages
library(ggplot2)
library(dplyr)
library(car)

# Load the data set
heights <- c(168, 172, 171, 169, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182,
             183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196,
             197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210,
             211, 212, 213, 214, 215, 216, 217, 218)

Normality Assumption

The normality assumption states that the data follows a normal distribution. We will use a histogram and a QQ plot to visually inspect the data and a Shapiro-Wilk test to measure the data numerically.

# Check the normality assumption
# Plot a histogram of the data
ggplot(data = data.frame(heights), aes(x = heights)) +
    geom_histogram(bins = 10, fill = "skyblue", color = "black") +
  labs(title = "Histogram of Heights", x = "Height (cm)", y = "Frequency")

Histogram of Heights by using geom_histogram function from ggplot2 in R
# Plot a QQ-plot of the data
ggplot(data = data.frame(heights), aes(sample = heights)) +
  stat_qq() +
  stat_qq_line() +
  labs(title = "QQ-plot of Heights", x = "Theoretical Quantiles", y = "Sample Quantiles")

QQ plot by using the stat_qq and stat qq line function from ggplot2 package of Rstudio

# Perform a Shapiro-Wilk test of the data
shapiro.test(heights)

Shapiro-Wilk normality test by using shapiro.test in RStudio

The histogram shows the data is roughly symmetrical and bell-shaped, with no obvious outliers or skewness. The QQ plot shows that the data points are mostly aligned with the diagonal line, with no obvious deviations or patterns. 

The Shapiro-Wilk test shows that the p-value is 0.8793, greater than 0.05, the common significance level. It means that we fail to reject the null hypothesis that the data is normally distributed. Therefore, based on this analysis, we can conclude that the normality assumption is met for the data.

People Also Read

Homogeneity of Variance

The homogeneity of variance assumption states that the data has equal or similar variances across different groups or levels of the independent variable. Since we only have one group or sample in this example, this assumption is not applicable, and we can skip it.

Independence assumption

Next, we will check the independence assumption, which states that the data is independent or not influenced by each other across different observations or samples. This assumption is usually verified by the study's design or the data collection process rather than by data analysis. Therefore, we need to rely on the information about the data set and use our common sense and judgment to decide whether this assumption is met.

In this example, we are told that the data set contains the heights of 50 students from a class. We can assume that the heights of the students are not influenced by each other and that they are randomly collected from the class. Therefore, the independence assumption is met for the data.

Random sampling assumption

The random sampling assumption states that the data is randomly sampled, or selected by chance, from the population of interest. This assumption is also verified by the study's design or the data collection process rather than by data analysis. Therefore, we need to rely on the information about the data set and use our common sense and judgment to decide whether this assumption is met.

In this example, we are told that the data set contains the heights of 50 students from a class. The class is representative of the population of interest, which is the population's average height. Therefore, the random sampling assumption is met for the data.

How to Report Parametric Tests in R?

After we have performed and interpreted them in R, we need to report them clearly and professionally, using APA style and best practices. Reporting parametric tests in R involves two main steps:

  • The results section summarizes the main findings and statistics of the parametric tests using text and numbers.
  • Creating and formatting the tables and figures, which display the data and the results of the parametric tests, using visuals and labels.

Common Mistakes in Parametric Testing

Violation of Assumptions

One common mistake involves disregarding the assumptions of parametric tests. The results may be unreliable if the data fails to meet these assumptions. To assess normality, it is vital to employ graphical methods, such as histograms or normal probability plots. 

In cases where non-normality is observed, alternative approaches like transformations or non-parametric tests should be considered.

Improper Sample Size

Small sample sizes may result in underpowered tests, diminishing the ability to detect significant differences. Power analysis can aid in determining the appropriate size of the sample based on the effect size and significance level.

Misinterpretation of Results

Misinterpreting the results is a common error in parametric testing. It is crucial to comprehend the output generated by Rstudio and accurately interpret its implications. Seeking guidance from statisticians or consulting reputable sources can help mitigate misinterpretation.

Best Techniques and Practices

Data transformation

If the assumption of normality is flawed, data transformation techniques may be employed to normalize the data distribution. Frequently used transformations include logarithmic, square-root, and Box-Cox transformations. These modifications maintain the proper presumptions while allowing the use of parametric testing.

Non-Parametric Tests

When the assumptions of parametric tests cannot be fulfilled, non-parametric tests provide a reliable alternative. The Kruskal-Wallis and Wilcoxon rank-sum tests are examples of non-parametric tests that don't rely on specific data distribution hypotheses. When dealing with ordinal or skewed data, they are handy.

Increase the Sample Size

The effectiveness of parametric tests will rise as the sample size grows. A larger sample size reduces standard error and increases the likelihood that when substantial changes do occur, they will be detected. Depending on the specific research topic, a statistician's guidance can be required to choose the appropriate sample size.

Conclusion

In this article, we have learned how to perform and report parametric tests in R, using examples of parametric tests. We have also learned how to check the assumptions of the parametric tests and how to use some packages and functions that can help us with the analysis and the reporting process.

Parametric tests are powerful and widely used statistical tools that can help us test hypotheses and draw conclusions about the data. However, they also require some conditions and criteria to be met, such as normality, homogeneity of variance, independence, random sampling, etc. Therefore, we must be careful and rigorous when performing and reporting parametric tests in R and follow the APA style and best practices.

We hope this article has been informative and helpful and that you have gained some insights and skills on performing and reporting parametric tests in R. If you have any questions or feedback, please contact us. Thank you for reading.

Frequently Asked Questions (FAQs) 

What are the parametric tests in R?

Common parametric tests in R include t-tests (e.g., `t.test()`), ANOVA (e.g., `aov()`), and linear regression (e.g., `lm()`).

Which statistical test should I use in R?

The choice depends on your data and research question. For comparing means, use t-tests or ANOVA; for relationships, Pearson correlation (e.g., `cor.test()`) or linear regression.

How do you know if data is parametric or non-parametric in R?

Graphical methods (histograms, QQ-plots) and statistical tests (Shapiro-Wilk, Kolmogorov-Smirnov) help assess normality. Non-parametric tests may be suitable if normality assumptions are violated.

What is a Non-parametric test?

Non-parametric tests are used when parametric assumptions cannot be met, allowing for robust analysis without specific distributional requirements.

What are the non-parametric tests in R studio?

Non-parametric tests include Mann-Whitney U test (`wilcox.test()`), Kruskal-Wallis test (`kruskal.test()`), and Spearman correlation (`cor.test(method = "spearman")`).

What is an ANOVA test in R?

ANOVA (Analysis of Variance) in R is performed with `aov()` to compare means of more than two groups, testing whether significant differences exist.

What is the Pearson chi-test in R?

The Pearson chi-squared test (`chisq.test()`) in R assesses independence between categorical variables in a contingency table.

Is Chi-Square a parametric test?

The Chi-Square test is non-parametric as it makes no assumptions about the data distribution.

What are the 4 non-parametric tests?

Common non-parametric tests include Mann-Whitney U test, Kruskal-Wallis test, Wilcoxon signed-rank test, and Spearman correlation.

When not to use a parametric test?

Avoid parametric tests when assumptions (normality, homogeneity of variance) are violated or for non-continuous data.

How do I run a non-parametric test in R?

Use functions like `wilcox.test()` or `kruskal.test()` for non-parametric tests in R, depending on your study design.

What is the difference between parametric and non-parametric tests in R?

Parametric tests assume specific data distributions. In contrast, non-parametric tests make fewer distributional assumptions, offering robustness in various scenarios.

Is Pearson's R a parametric test?

Yes, Pearson's correlation (R) is a parametric test, assuming a normal distribution of variables.

Is the Mann-Whitney U test a parametric test?

No, the Mann-Whitney U test is non-parametric, suitable for ordinal or continuous data that doesn't meet parametric assumptions.

Should I use Spearman or Pearson?

Use Pearson for linear relationships with continuous data; use Spearman for monotonic relationships or when assumptions of linearity are not met.

Parametric tests with R examples

The parametric test compares means between two groups assuming normality. Parametric test examples in R include t-tests (`t.test()`), ANOVA (`aov()`), and linear regression (`lm()`).

Characteristics of parametric tests

Parametric tests assume specific data distributions, continuous variables, and adherence to assumptions like normality and homogeneity.

Conditions for parametric tests

Conditions for parametric tests include normality, homogeneity of variance, independence, and continuous data.

References

  • Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. Sage publications.
  • R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
  • Wickham, H. (2016). ggplot2: elegant graphics for data analysis. Springer.
  • Dahl, D. B. (2016). xtable: Export tables to LaTeX or HTML. R package version 1.8-2. https://CRAN.R-project.org/package=xtable
  • Hlavac, M. (2018). stargazer: Well-formatted regression and summary statistics tables. R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
  • Fox, J., & Weisberg, S. (2019). An R companion to applied regression. Sage publications.
  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Routledge.

Do you need help with a data analysis project? Let me assist you! With a PhD and ten years of experience, I specialize in solving data analysis challenges using R and other advanced tools. Reach out to me for personalized solutions tailored to your needs.

About the author

Zubair Goraya
Ph.D. Scholar | Certified Data Analyst | Blogger | Completed 5000+ data projects | Passionate about unravelling insights through data.
-->

Post a Comment

Have A Question?We will reply within minutes
Hello, how can we help you?
Start chat...