HW6 (Supplement)#
Write a .do file that performs the tasks described below. Your .do file must be called
hw6sup.lastname.firstname.do
. Make sure your script will run on our machines by handling filepath ambiguity in your script. Do not submit your log files as part of the assignment.
#
Use the dataset pra_hist.dta
and student_pressure.dta
to perform the required tasks.
global repo https://github.com/jhustata/basic/raw/main/
Questions
Context: Two students discuss HW6 and comeup with the following solution to Q1:
use ${repo}pra_hist.dta, clear
l in 1/10
g count=1
bys visit_id: egen Count=sum(count)
l in 1/10
egen tagvisit = tag(visit)
l visit_id Count if tag==1
Focusing on the output rather than the code, you compare this solution with the that posted by the teaching team and find slight differences in output. You then analyze the code and find that the teaching team used the “collapse” command whereas the students used the “egen” and “tag” commands. How would you help these students to remain true to their unique approach, but with accurate output? In other words, modify their code.
The original question is included below for your convenience:
You are conducting a study that examines the regional variation in the distribution of panel- reactive antibody (PRA). You recruited 73 patients (
px_id
= 1, …., 73 ) from 10 hospitals (hosp_id
=1, …,10) in 3 regions (region
=A, B, C, … ), and measured PRA 3 times: visit 1, visit 2, and visit 3. You hear that the organization that funds your research plans to extend the funding for several more visits (visit 4, visit 5, …, visit N). Since you do not know how many more visits there will be, you decide to write a .do file that can work regardless of how many visits the dataset has.
Codebook
Variable |
Description |
Values/Range |
---|---|---|
pra_hist.dta |
||
|
Hospital ID |
Integers: 1 – 10 |
|
Patient ID |
Integers: 1 – 94 |
|
Visit ID |
Integers: 1 – N |
|
PRA value at the visit |
Integers: 0 – 100 |
student_pressure.dta |
||
|
Hospital ID |
Integers: 1 – 100 |
|
Region |
Integers: 1 - 8 |
|
Region |
Intergers: “within biological limits” |
|
Region |
Dates: 28mar2024 - 16may2024 |
Note: to uniquely identify a patient you’d have to specify both hospital ID and patient ID. In other words, patients in different hospitals may have the same Patient ID
i) Load pra_hist.dta
. Print a table as shown below, which displays the number of
patients with a non-missing PRA value at each visit. N
and XX
should be replaced with the
correct values from the dataset. (Hint: how do you write a forvalue loop for all values of
visit_id
?)
use "${repo}pra_hist", clear
Question 1.i)
Visit Count
1 XX
2 XX
⋮
[omitted, but your .do file should display all variables]
⋮
N XX
ii) An eccentric instructor measured the blood pressure of students over an eight week period. Blood pressure was measured before each one of eight weekly sessions, and a record for each student was kept. After analyzing these records, can you print out the following statement, substituting the XXXs with appropriate macros?
“student_id XXX has the highest blood pressure on record, SBP=XXX for session X”
use ${repo}student_pressure, clear