Contents

Below we’ll intersperse the Stata code you used for hw1 with explanatory notes in a concise manner suitable for intermediate learners.

clear 
cls

Note: Clears the current dataset from memory and the Stata command window to start with a clean slate.

if c(N) { 
    // Inspired by Polack et al., NEJM 2020
    // Let's reverse engineer and simulate data based on their results
}

Note: The comment introduces the purpose of the simulation, inspired by a study on the efficacy of a COVID-19 vaccine.

if c(os)=="Windows" {
    global workdir "`c(pwd)'\"
}
else {
    global workdir "`c(pwd)'/"
}

Note: Sets a global variable workdir to the current working directory, accommodating both Windows and Unix-based systems by adjusting the path’s slashes.

capture log close
log using ${workdir}simulation.log, replace //this code may need debugging for those with spaces in their filepaths

Note: Closes any open log and starts a new log file to record the session’s output, ensuring all commands and results are saved for review.

set seed 340600
set obs 37706

Note: Initializes the random number generator to ensure reproducibility and sets the number of observations (participants) in the dataset to 37,706, matching the study’s scale.

g bnt=rbinomial(1,.5);

Note: Generates a binary variable bnt to simulate random assignment to the vaccine or placebo group with equal probability.

lab define Bnt 0 "Placebo" 1 "BNT162b2";
label values bnt Bnt;

Note: Labels the bnt variable for clarity: 0 as “Placebo” and 1 as “BNT162b2” (the vaccine).

gen female=rbinomial(1, .494);

Note: Simulates gender distribution among participants, with approximately 49.4% being female.

tempvar dem;
gen `dem'=round(runiform(0,100),.1); 

Note: Creates a temporary variable dem to assist in generating a race distribution among participants based on specified probabilities.

gen age=(rt(_N)*9.25)+52; 
replace age=runiform(16,91) if !inrange(age,16,91); 

Note: Generates participant ages using a transformation of the t-distribution and ensures all ages fall within the 16-91 range by replacing outliers with uniformly distributed ages within the range.

g days=rweibull(.7,17,0) if bnt==0;
g covid=rbinomial(1, 162/21728) if bnt==0; 
replace days=rweibull(.4,.8,0) if bnt==1;
replace covid=rbinomial(1, 14/21772) if bnt=1; 

Note: Simulates the days until potential COVID-19 infection and whether an infection occurred, with different parameters for the vaccine and placebo groups to reflect the vaccine’s efficacy.

stset days, fail(covid) ;

Note: Prepares the dataset for survival analysis, specifying days as the time variable and covid as the failure event indicator.

sts graph, by(bnt);

Note: Generates a Kaplan-Meier survival curve by treatment group to visualize the difference in COVID-19 incidence over time.

stcox bnt;

Note: Fits a Cox proportional hazards model to assess the vaccine’s effect on the hazard of contracting COVID-19.

lab var bnt_id "Participant Identifier";

Note: Labels the variables with descriptive names for clarity in the dataset’s context.

save BNT162b2, replace 

Note: Saves the simulated dataset for future use or analysis.