Lab 5 solution#
Part I#
1#
Let the students know that there are no shortcuts: the more practice, the better they’ll become
2#
// 1
_request(N)
$N
// 2
forvalues i=1/$N {
di `i' //local macro
}
// 3
_request(data`i')
${data`i'}
// 4
global repo "https://github.com/jhustata/basic/raw/main/"
$repo
// 5
global nhanes "https://wwwn.cdc.gov/Nchs/Nhanes/"
$nhanes
// 6
ds
`r(varlist)'
// 7
foreach v of varlist `r(varlist)' {
`v'
}
// 8
foreach v in `c(ALPHA)' {
di `c(ALPHA)' //c() class macro
creturn list
}
// etc.
3#
Simple example under foreach v of varlist {
in Lab 5 Part I Section 2.1:
Click on the “token”
varlist
in the program might have been a list of hardcoded variabes in a.do
filee.g. init_age bmi
But a program generalizes this using
syntax varlist
The user may then choose the variable list of their preference
4.1#
do ${repo}loopxpt.do
Content of loopsxpt.do
?
cat ${repo}loopxpt.do
4.2#
You can copy and paste this into ChatGPT and ask for notes. Encourage students to do this. Please let them use other AIs as well.
qui {
cls
clear
set timeout1 1000
global nhanes "https://wwwn.cdc.gov/Nchs/Nhanes/"
tokenize "`c(ALPHA)'"
local y = 1999
forvalues i=1/3 {
local yp1 = `y' + 1
if `y' == 1999 {
local letter = ""
}
else {
local letter = "_``i''"
}
import sasxport5 "${nhanes}`y'-`yp1'/DEMO`letter'.XPT", clear
//tempfile nh`y'_`yp1'
g surv = "`i'"
save nh`y'_`yp1', replace
noi di "`letter'"
local y = `y' + 2
}
clear
local y = 1999
forvalues i=1/3 {
local yp1 = `y' + 1
append using nh`y'_`yp1'
rm "nh`y'_`yp1'.dta"
local y = `y' + 2
}
save nh, replace
//verify at https://wwwn.cdc.gov/Nchs/Nhanes/1999-2000/DEMO.htm
noi tab surv
noi ls
}
Here’s an example with the query: annotate:
This code appears to be written in Stata, a statistical software package. It’s designed to automate the process of downloading, importing, and merging demographic datasets from multiple biennial waves of the National Health and Nutrition Examination Survey (NHANES) hosted by the CDC. Let me annotate each segment for you:
qui { // Start a quiet block where outputs are suppressed
cls // Clear the screen
clear // Clear any existing data in memory
set timeout1 1000 // Set a timeout for potentially slow operations
global nhanes "https://wwwn.cdc.gov/Nchs/Nhanes/" // Define a global variable for the NHANES website URL
tokenize "`c(ALPHA)'" // This line seems erroneous or out of context
local y = 1999 // Initialize a local variable 'y' to start from the year 1999
forvalues i=1/3 { // Loop over three biennial cycles (1999-2000, 2001-2002, 2003-2004)
local yp1 = `y' + 1 // Calculate the end year of the biennial cycle
if `y' == 1999 {
local letter = "" // No letter suffix for the first cycle
}
else {
local letter = "_``i''" // Suffix for file names in subsequent cycles, seems incorrect should be `_` not `_``i''`
}
import sasxport5 "${nhanes}`y'-`yp1'/DEMO`letter'.XPT", clear // Import the dataset from a .XPT file
//tempfile nh`y'_`yp1' // Define a temporary file, commented out
g surv = "`i'" // Generate a new variable 'surv' to distinguish each survey wave
save nh`y'_`yp1', replace // Save the dataset with a new name
noi di "`letter'" // Display the suffix letter quietly, for debugging
local y = `y' + 2 // Increment the start year by 2 for the next biennial cycle
}
clear // Clear the data in memory
local y = 1999 // Reset the start year for merging process
forvalues i=1/3 { // Loop over the same three cycles for merging
local yp1 = `y' + 1 // Calculate the end year for each cycle
append using nh`y'_`yp1' // Append each dataset to the master dataset
rm "nh`y'_`yp1'.dta" // Remove the intermediate Stata data files, wrong extension should be .dta
local y = `y' + 2 // Increment the start year by 2 for the next cycle
}
save nh, replace // Save the final merged dataset as 'nh'
//verify at https://wwwn.cdc.gov/Nchs/Nhanes/1999-2000/DEMO.htm // Comment for verification
noi tab surv // Tabulate the 'surv' variable quietly to check data consistency
noi ls // List the files in the current directory quietly
}
There are some potential issues and errors in the script that may need addressing, such as file suffix logic and file extensions in the rm
command.
5#
I copied the above script into the template below & asked GPT-4 to “help indent”. And then I made a few edit to empower the user, as they select the NHANES surveys of interest to them.
capture program drop nhanes_demo
program define nhanes_demo
//copy & paste here
end
GPT-4: Here’s a cleaner and more properly indented version of your Stata code for defining the nhanes_demo
program. I’ve made sure to align the blocks and structure it to enhance readability:
capture program drop nhanes_demo
program define nhanes_demo
syntax, begin(int) end(int)
qui {
cls
clear
set timeout1 1000
global nhanes "https://wwwn.cdc.gov/Nchs/Nhanes/"
tokenize "`c(ALPHA)'"
local y = `begin' //edit, this macro is from the syntax command
local N = `end' - `begin' //insertion
forvalues i=1/`N' { //edit
local yp1 = `y' + 1
if `y' == 1999 {
local letter = ""
}
else {
local letter = "_``i''"
}
import sasxport5 "${nhanes}`y'-`yp1'/DEMO`letter'.XPT", clear
//tempfile nh`y'_`yp1'
gen surv = "`i'"
save nh`y'_`yp1', replace
noi di "`letter'"
local y = `y' + 2
}
local y = `begin'
clear
forvalues i=1/`N' {
local yp1 = `y' + 1
append using nh`y'_`yp1'
rm "nh`y'_`yp1'.dta"
local y = `y' + 2
}
save nh, replace
//verify at https://wwwn.cdc.gov/Nchs/Nhanes/1999-2000/DEMO.htm
noi tab surv
noi ls
}
end
This revision aligns the commands within loops and conditions, which makes it easier to follow the logic. The indentation helps to visually separate commands from control structures like loops and conditional statements.
Part II#
In the first-half of the course, we discussed how you can define your own “program”. It’s an awesome tool that allows us to automate a specific task. If you think a specific part of your code will be used multiple times, you might as well put that into a program. In this lab, we will practice customizing our programs.
Start Stata, open your do-file editor, lay out a template for your basic .do file structure using
qui {
,if 0 {
, and andif 1
. Loadtransplants.dta
in yourif 2
block or wherever you feel it fits best.Write a program called
mymean
. This program will takevarlist
as a user input, and calculate the mean value of each variable, and display the values.capture program drop mymean program define mymean syntax varlist foreach var in `varlist' { quietly sum `var' display r(mean) } end
Modify your program
mymean
so that when anif
argument is supplied,mymean
would only include the observations that meet the condition specified by theif
argument. In other words, if the user typesmymean height if age>65
, the programmymean
will calculate the mean only among patients older than 65.capture program drop mymean program define mymean syntax varlist [if] foreach var in `varlist' { quietly sum `var' `if' display r(mean) } end
Further modify your program
mymean
to include the optionsd
. When the optionsd
is supplied,mymean
will display the standard deviation along with the mean. This version ofmymean
should still be able to accommodate theif
argument.capture program drop mymean program define mymean syntax varlist [if], [sd] foreach var in `varlist' { quietly sum `var' `if' display r(mean) if "`sd'" != "" { display r(sd) } } end // The answer above is in the simplest possible form for clarity. In practice, I will arrange the outputs a little bit better. capture program drop mymean program define mymean syntax varlist [if], [sd] foreach var in `varlist' { quietly sum `var' `if' if "`sd'" != "" { display "`var': " r(mean) " (" r(sd) ")" } else { display "`var': " r(mean) } } end
Further modify your program
mymean
to include the optiondigits()
, with a number in the parenthesis. When the optiondigits()
is supplied,mymean
will round up the mean (and the standard deviation, if applicable) in units ofdigits()
. Ifdigits()
is NOT supplied, round in units of 0.001. (Hint: use the Stata functionround()
)capture program drop mymean program define mymean syntax varlist [if], [sd] [digits(real 0.001)] foreach var in `varlist' { quietly sum `var' `if' display round(r(mean), `digits') if "`sd'" != "" { display round(r(sd), `digits') } } end
Did you make
if
,sd
, anddigits()
optional arguments? That is, your program should run whether or not these arguments are supplied. To do so, simply surround each argument with brackets. For example,[sd]
I’d like to draw your attention to the merge command. It’s hard to write a question around
merge
, but it’s a really important command in practice. For instance, we used it in week 4. Contrastmerge
withappend
, something we may not get a chance to demonstrate.merge 1:1 fake_id using donors_recipients
This is the code from the lecture. We are merging
transplants.dta
withdonors_recipients.dta
. We are merging observations with the samefake_id
, and expect that there will be only one observation perfake_id
in both datasets.We want to study if death (
died==1
) is associated with several predictor variables:bmi
,prev_ki
,age
,peak_pra
, orgender
. Run logistic regression betweendied
and each of the predictor variables usingforeach
loop. At each run, save the name and the regression coefficient of the predictor variable into an external Stata dataset file namedoutput.dta
.postfile output str30 name coef using output // you may add ", replace" to allow overwriting output.dta foreach var in bmi prev_ki age peak_pra gender { quietly logistic died `var' post output ("`var'") (_b[`var']) } postclose output
OR
postfile output str30 name coef using output // you may add ", replace" to allow overwriting output.dta foreach var in bmi prev_ki age peak_pra gender { quietly logistic died `var' quietly lincom `var' post output ("`var'") ("`r(estimate)'") } postclose output
You have all your commands in your do file, right? Run your do file from the beginning and make sure your do file does exactly the same thing.