Lab 6#

Part I#

  1. Start Stata, open your do-file editor, write the header, and load transplants.dta.

global repo https://github.com/jhustata/basic/raw/main/
use "${repo}transplants", clear 
  1. ctr_id indicates the ID of the transplant center where the patient received the transplant. Count the number of recipients at each center, and store in a new variable volume.

  2. List ctr_id and volume to see how many patients each center has. Maybe let’s try this:

    list ctr_id volume

  3. This is not what we wanted. Generate a variable ctr_tag that “tags” one observation per center. See week 3 Section 3.3 for more examples on the function tag()

  4. Now list ctr_id and volume, but just for one record per center.

  5. Calculate the mean age of the patients at each center, and store in a new variable mean_age.

  6. For each primary diagnosis subgroup (use variable dx), run a regression with age as the predictor and peak PRA (peak_pra) as the outcome.

  7. Now let’s make the output cleaner. Count the number of cases within each diagnosis group. If there are more than 500 cases, run the regression and display the output. If not, display “There are fewer than 500 cases.”

  8. Define a program called reg_pra. This program will perform the same tasks as described in Question 8, but the regression will take one or more variables specified by the user as the predictor.

  9. You have all your commands in your do file, right? Run your do file from the beginning and make sure your do file does exactly the same thing.

Part II#

Hopefully you spent some time mastering the notes from week 4. But because you didn’t have the incentive in the form of homework, here’s an additional opportunity for you.

  1. Start Stata, open your do-file editor, and load transplants.dta

use ${repo}transplants, clear 
  1. Let’s merge this dataset with the donor dataset. First, merge with donors_recipients.dta, and then with donors.dta, without specifying any options.

    merge 1:1 fake_id using ${repo}donors_recipients
    
  2. What does Stata say? Interpret the output.

    . merge 1:1 fake_id using ${repo}donors_recipients
    
        Result                           # of obs.
        -----------------------------------------
        not matched                         4,000
            from master                         0  (_merge==1)
            from using                      4,000  (_merge==2)
    
        matched                             6,000  (_merge==3)
        -----------------------------------------
    

    From the bottom: 6000 observations were successfully merged. 4000 observations from the using dataset (= donors_recipients.dta) were NOT matched with the master dataset (= transplants.dta) but brought in anyways. 0 observation from the master dataset were not matched (i.e., all observations from the master dataset were matched.)

  3. transplants.dta is our study population. We don’t want to bring in extra observations by merging. Use the option keep and make sure we don’t bring in extra observations from donors_recipients.dta.

    use ${repo}transplants, clear
    merge 1:1 fake_id using ${repo}donors_recipients, keep(master match) nogen
    
  4. Let’s move forward and merge with donors.dta.

    merge m:1 fake_don_id using ${repo}donors, keep(master match)
    
  5. Now we want to calculate the mean age and the number of patients at each center. Preserve the dataset and collapse it by ctr_id. Explore the collapsed dataset using list.

    preserve
    collapse (mean) age (count) n=fake_id, by(ctr_id)
    
  6. Restore the dataset. The plan has changed. We want to calculate these statistics in ECD cases and non-ECD cases separately (use the variable don_ecd). Calculate the mean age and the number of ECD patients and non-ECD patients at each center.

    restore
    collapse (mean) age (count) n=fake_id, by(ctr_id don_ecd)
    
  7. After the collapse, each center has two observations. One for ECD cases and another for non-ECD cases. Reshape the dataset into a wide format (i.e., each center has only one observation).

    reshape wide age n, i(ctr_id) j(don_ecd)
    
  8. You have all your commands in your do file, right? Run your do file from the beginning and make sure your do file does exactly the same thing.