R Package/Code

(1) CorrelationR Package

This package provides functions for executing robust correlation analysis. Functions inlcude:

  1. Median Absolute Deviation Correlation
  2. Median Based Correlation

Pearson’s correlation (r) is widely-employed statistic for assessing bivariate linearity. This correlation is not robust against outliers, violations of normality, and when X and Y are not linearly related.

Median Absolute Deviation correlation can be used to examine whether two continuous variables (X and Y) are linearly related using a deviate estimation. The median deviation is more robust than the conventional standard deviation (SD) estimate.

Median Based Correlation can be used to examine whether two continuous variables (X and Y) are linearly related using a median correlation coefficient. It is a A derivative of the Median Absolute Deviation and the median correlation coefficient is more robust than the conventional standard deviation (SD) estimate.

Li, J. C.-H. (2022). Bootstrap Confidence Intervals for 11 Robust Correlations in the Presence of Outliers and Leverage Observations. Methodology18(2), 99-125. https://doi.org/10.5964/meth.8467



(2) Common-Language Effect Size (R code)

Li, J. C.-H. (2016). Effect size measures in a two independent-samples case with non-normal and non-homogeneous data. Behavior Research Methods, 48, 1560-1574. https://doi.org/10.3758/s13428-015-0667-z (Supplementary materials: https://osf.io/msy3h/) 

The following sections include the supplemental materials for the study entitled as “Effect Size Measures in a Two Independent-Samples Case with Non-Normal and Non-Homogeneous Data”. The first section presents a Mathematica code that can be used to estimate the six effect sizes given a real-world database (named “data.csv”). The second section includes the Monte Carlo simulation code used in the study. The third section presents a URL that links to the full report of the percentage biases that were used to create Figures 1 and 2 in the study.

1. A Mathematica Code Used to Obtain the Six ES Estimates Based on the Hypothetical Example in the Conclusion and Discussion Section

  • First, save the data as “data.csv”, where the first column contains the vale labels for the two groups (i.e., 0 and 1), and the second includes the observations for each participant.
    • For example, enter the hypothetical data shown in the conclusion and discussion section of the study and save it as “data.csv”. Alternatively, download the data file in excel format (data) and save it as “data.csv”. Note that the data file should be placed to the location that can be retrieved in the Mathematica code below (i.e., line 1: data = Import[“C:/data.csv”], where C:/ means that the data is saved to the C:/ drive of your computer).
  • Second, run the following Mathematica code.

code1

  • Third, obtain the 6 ES estimates below.

—————————————-Output——————————————-

d = -0.135157, dr* = 0.611744, dr = 0.39274, rpb = -0.0674249, CL = 0.446244, Aw = 0.6416

——————————————————————————————

2. The Monte Carlo Simulation Code Used in the Study

code2

3. The Full Values for the Percentage Biases in Figures 1 and 2.

Table



(3) Robust Regression (R code)

Description

Supplementary materials to: Kim, J., & Li, J. C.-H. (2023). Which robust regression technique is appropriate under violated assumptions? A simulation study. Methodology, 19(4). https://doi.org/10.5964/meth.8285

Abstract

Ordinary least squares (OLS) regression is widely employed for statistical prediction and theoretical explanation in psychology studies. However, OLS regression has a critical drawback: it becomes less accurate in the presence of outliers and non-random error distribution. Several robust regression methods have been proposed as alternatives. However, each robust regression has its own strengths and limitations. Consequently, researchers are often at a loss as to which robust regression method to use for their studies. This study uses a Monte Carlo experiment to compare different types of robust regression methods with OLS regression based on relative efficiency (RE), bias, root mean squared error (RMSE), Type 1 error, power, coverage probability of the 95% confidence intervals (CIs), and the width of the CIs. The results show that, with sufficient samples per predictor (n = 100), the robust regression methods are as efficient as OLS regression. When errors follow non-normal distributions, i.e., mixed-normal, symmetric and heavy-tailed (SH), asymmetric and relatively light-tailed (AL), asymmetric and heavy-tailed (AH), and heteroscedastic, the robust method (GM-estimation) seems to consistently outperform OLS regression. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

https://www.psycharchives.org/en/item/69231d18-ff08-4209-ba7e-f6f84e037edf