lab7 Solutions#
Please use this lab as an opportunity to review the course material and prepare yourself for hw7
. Sample responses to the lab questions are provided below.
global repo https://github.com/jhustata/basic/raw/main/
Start Stata, open your do-file editor, write the header, and load
transplants.dta
.
use "${repo}transplants", clear
Get a 10% random sample of the dataset. Specifically, follow these steps. (1) Set a seed number. (2) Generate a variable that includes a random number between 0 and 1 following a uniform distribution. (3) Sort by the random variable. (4) Keep the first 10% observations and drop the rest. (5) Drop the random variable.
count set seed 2024 gen rdm=runiform() sort rdm keep if _n<=_N/10 drop rdm count //alternative use transplants, clear count sample 10 count
Clear and reload
transplants.dta
.
use "${repo}transplants", clear
Generate a variable called
fake_age
which is a normally distributed random variable with mean and standard deviation equal to the mean and standard deviation of the actual age variable.set seed 123 sum age age g fake_age=rnormal(r(mean), r(sd)) compare age fake_age kdensity age, addplot(kdensity fake_age) list fake_id age fake_age in 1/10 graph export kdensity.png, replace
Make a scatter plot of peak PRA by age in transplant recipients. Does it look like there’s a relationship between peak PRA and age, and if so, what is the relationship?
use "${repo}transplants", clear
graph twoway scatter peak_pra age //full syntax
tw sc peak_pra age //abbreviated syntax
//explore other twoway options!!
#delimit ;
forval f=0/1 { ;
sum peak_pra if gender==`f', d ;
local m_iqr_`f': di
"Median" %2.0f r(p50)
" (IQR," %2.0f r(p25)
"-" %2.0f r(p75)
")"
;
} ;
tw (sc peak_pra age if gender==0)
(sc peak_pra age if gender==1,
legend(
on
ring(0)
pos(11)
lab(1 "Male")
lab(2 "Female")
)
ti("Most Recent Serum PRA",pos(11))
yti("%", orientation(horizontal))
text(50 10 "`m_iqr_0'",col(midblue))
text(45 10 "`m_iqr_1'",col(cranberry))
)
;
#delimit cr
graph export lab6q5.png, replace
The graph of proportion of ECD transplants by age from the lecture was a little messy. Remake the graph with the age rounded to the nearest ten years.
use "${repo}transplants", clear
collapse (mean) don_ecd, by(age)
graph twoway line don_ecd age, text(.5 40 "obs: `c(N)', vars: `c(k)'")
graph export collpasebyage.png,replace
count
//alternative, without messing up the data
if c(N) == r(N) | c(N) == 6000 {
use transplants, clear
egen m_don_ecd=mean(don_ecd), by(age)
egen agetag=tag(age)
#delimit ;
line m_don_ecd age if agetag,
text(
.5 40
"obs: `c(N)', vars: `c(k)'"
)
sort ;
#delimi cr
count
graph export lab6q6.png,replace
}
use "${repo}transplants", clear
gen age10 = round(age, 10)
//one way to restore data after messing it up
preserve
collapse (mean) don_ecd, by(age10)
graph twoway line don_ecd age10
graph export collpasebyage10.png,replace
restore
count
You have all your commands in your do file, right? Run your do file from the beginning and make sure your do file does exactly the same thing.