Statistical test & epidemiological study designs (R code)

5 minute read

Data overview

#Histogram
ggplot(data, aes(x = values)) +
  geom_histogram(binwidth = 0.5, fill = "blue", color = "black", aes(y = ..density..)) +
  geom_density(alpha = 0.2, fill = "orange") +
  labs(title = "Histogram of Random Data", x = "Values", y = "Density")

#Boxplot and dotplot
RR_plot <- ggplot(data_fil_1_1, aes(x = 1, y = change))
RR_plot <- RR_plot + geom_boxplot(notch = FALSE, width = 0.9)
RR_plot <- RR_plot + geom_dotplot(binaxis = "y", binwidth = 1.0, stackdir = "center", alpha = 0.5) 
RR_plot <- RR_plot + theme_bw() + theme(
  axis.line = element_line(size = 1.0, lineend = "square"),
  text = element_text(colour ="black", size = 14),
  legend.position = "none",
  axis.ticks = element_line(linewidth = 1.5),
  axis.ticks.length = unit(-2, "mm"))
RR_plot

t-test & Wilcoxson

P-value and effect size should be calculated.

#t test 
t.test(data, data, var.equal=TRUE, paired = FALSE)

#Wilcoxson
library(exactRankTests)
wilcoxson(x = data, y = data, paired = T)

#Effect size (https://www.statology.org/cohens-d-in-r/)
library(effsize)
cliff.delta(data, data)
cohens.d(data, data)
herges_g()

Power test

library(MESS)
github

library(MESS)
#Grammer
Power_t_test(n = NULL, delta = NULL, sd = 1, sig.level = 0.05, power = NULL, ratio = 1,
  sd.ratio = 1, type = c("two.sample", "one.sample", "paired"), alternative = c("two.sided", "one.sided"),  df.method = c("welch", "classical"), strict = TRUE)

Question:Plasma-glucose levels are used to determine the presence of diabetes. Suppose the mean ln (plasma-glucose) concentration (mg/dL) in 35- to 44-year-olds is 4.86 with standard deviation = 0.54. A study of 100 sedentary people in this age group is planned to test whether they have a higher or lower level of plasma glucose than the general population.

#If you want know unknown
d <- 0.1/0.54
pwr.t.test(n = 200, d = d, sig.level = 0.05, power = NULL)

# If you want to know sample size
d <- 0.1/0.54
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = 0.80)

Cohort study

Power for comparing two independent proportions (two sample study, cohort study) Assume that 5% of the non-smokers will develop coronary heart disease during the study. You would like your study to have adequate power to detect a relative risk of 2.0.

#P2 = 0.05%
#P1 = 0.10 % given that relative risk of 2.0
#Given a prevalence of smoking 0.25 (This is assumption from literature review), which represents k = 3 ((1-0.25)/0.25). K is ratio of exposure group to non exposure group.

power_prop_test(n = NULL, p1 =0.1, p2 = 0.05, sig.level = 0.05, power = 0.9, ratio = 3, alternative = “one.sided”) 
#p1 and p2 already known

Case-Control study

Assume that approximately 30% of women of child-bearing age will have an exposure to oral contraceptives. You would like your case-control study to have 90% power to detect an odds ratio of 3.0 at the a = 0.05 level. How many women are needed assuming a 1:1 ratio of cases to controls?

#P2 = 0.3
#Odd = P1*(1-P1)(P2*(1-P2)
#P1 = 0.5625

power_prop_test(n = NULL, p1 =0.5625, p2 = 0.05, sig.level = 0.05, power = 0.9, ratio = 1, alternative = “one.sided”)

Case-Control (Matching) study

Suppose we are interested in comparing the effectiveness of two different antibiotics, A and B, in treating gonorrhea. Each person receiving antibiotic A is matched with an equivalent person (age within 5 years, same gender), to whom antibiotic B is given. These people are asked to return to the clinic within 1 week to see if the gonorrhea has been eliminated. Suppose the results are as follows: (a) for 40 pairs of people, both antibiotics are successful; (b) for 20 pairs of people, antibiotic A is effective whereas antibiotic B is not; (c) for 16 pairs of people, antibiotic B is effective whereas antibiotic A is not; (d) for 3 pairs of people, neither antibiotic is effective. How many matched pairs should be enrolled in a future study if we wish to conduct a two-sided test with = 0.05 and power = 0.80?

P12 <- 20/79 (79 is total pair sample) = 0.2025
P21 <- 16/79 = 0.2532
power.mcnemar_test(n = NULL, paid =0.2025, psi = 0.2532/0.2025, sig.level = 0.05, power = 0.9, ratio = 1, alternative = “two.sided”, method = “normal)

Third variable

For testing the association between a risk factor and disease adjusting for other factors, stratified analysis should be the first step in the analysis plan.
Testing the interaction (MH test of Homogeneity, P < 0.10)
If being no interaction, report average (adjusted) odds ratio

Approach (The similar analysis can be done with logistic regression model) 1.Test whether there is interaction, using woolf’s test (P-value, crude and adjusted odds ratio are shown in R program). 2.If not, check confounding by comparing adjusted and crude odds ratios.

Test whether there is association, using the Mantel-Haenszel test (assume no interaction).

github

# Create matrix
data <- c(rep(c(0, 1, 1), 120), rep(c(0, 1, 0), 80), rep(c(0, 0, 1), 111), rep(c(0, 1, 0), 155),
rep(c(1, 1, 1), 161), rep(c(1, 1, 0), 130), rep(c(0, 0, 1), 117), rep(c(0, 1, 0), 124)
data <- matrix(data, ncol = 3, byrow = TRUE)
data <- data.frame(data)
colnames(data) <- c(“personal”, “passive”, “disease”)
 
library(EpiStats)
library(dplyr)
library(knitr)

#Odds ratio stratified 
res1 <- CC(data[data$personal == 0, ], cases = “disease”, exposure = “passive”, exact = TRUE, full = TRUE)
kable(res1$df1, align = “r”)
kable(res1$df2, align = res1$df2.align)
res2 <- CC(d1[d1$personal == 1, ], cases = “disease”, exposure = “passive”, exact = TRUE, full = TRUE)
kable(res2$df1, align = “r”)
kable(res2$df2, align = res2$df2.align)

# Test for whether the two odds ratios differ (Woolf’s test for interaction)
# CCinter: Case-control interaction
res <- CCInter(data, cases = ”disease”, exposure = “passive”, by = “personal”, full = TRUE)
kable(res$df2)

#Test for association assuming no interaction (Mantel-Haenszel χ2 test)
The test should be used only when if the variance V > 5 (χ2 test)
#行列の作成
data <- array(c(98, 832, 169, 3520, + 54, 227, 55, 686, + 11, 85, 61, 926, + 7, 102, 90, 1936), + dim = c(2, 2, 4), + dimnames = list( + Exposure = c("1", "0"), + Response = c("1", "0"), + Race.Level = c("1", "2", "3", "4"))) 
mantelhaen.test(data, correct = FALSE)
 
res4 <- CC(d2, cases = “bw”, exposure = “smoking”, exact = TRUE, full = TRUE)
kable(res4$df2, align = res4$df2.align)

Test Odds trend (linear trend in odds ratio across levels)

Approach

Test homogeneity of odds (all odds ratio is equivalent)
If rejected, test for trend

github

#χ2 test (used only if variance is >5) Homogeneity 

disease <- c(320, 1206, 1011, 463, 220)
no_disease <- c(1422, 4432, 2893, 1092, 406)
n <- c(1742, 5683, 3904, 1555, 626)
m <- as.table(cbind(no_disease, disease))
damnames(m) <- list(age = c(“1”, “2”, “3”, “4”, “5”)), + disease = c(“N”, “Y”)
chisq.test(m)
 
# test for trend
prop.trend.test(disease, n, score = seq_along(disease))

#Since p < 0.001, we reject the null hypothesis and conclude that there is a linear trend in the odds

Maternal extension test Goal is to determine whether there is a linear trend in the odds ratios across levels of an (ordinal) risk factor after controlling for a confounder.

mantelhaen(table)

Share on

X Facebook LinkedIn Bluesky

Hiroki Ando (安藤宏紀)

Statistical test & epidemiological study designs (R code)

Data overview

t-test & Wilcoxson

Power test

Cohort study

Case-Control study

Case-Control (Matching) study

Third variable

Test Odds trend (linear trend in odds ratio across levels)

Share on

You May Also Enjoy

Bayesian modeling (R code)

Generalized linear model (R code)

Data visualization

Data organization & Basic analysis

Hiroki Ando (安藤 宏紀)

Data overview

t-test & Wilcoxson

Power test

Cohort study

Case-Control study

Case-Control (Matching) study

Third variable

Test Odds trend (linear trend in odds ratio across levels)

Share on

You May Also Enjoy

Bayesian modeling (R code)

Generalized linear model (R code)

Data visualization

Data organization & Basic analysis

Hiroki Ando (安藤宏紀)