5. Graphs#
Visualizing data
capture log close _all
log using creturn_list.log, replace
creturn list
log close
When you search the above output for scheme
you’ll find the follow:
Graphics settings
----------------------------------------------------------------------------------------------------
c(graphics) = "on" (set graphics)
c(scheme) = "stcolor" (set scheme)
c(printcolor) = "asis" (set printcolor)
c(copycolor) = "asis" (set copycolor)
c(maxbezierpath) = 0 (set maxbezierpath)
c(min_graphsize) = 1 (region_options)
c(max_graphsize) = 100 (region_options)
----------------------------------------------------------------------------------------------------
My machine has c(version) == 18
and the default is stcolor
. The world-renowned, sui generis, Stata s2color
has finally been abandoned
But if I wish to reproduce that classic Stata graphical output I might change my color scheme:
webuse lifeexp, clear
hist lexp
webuse lifeexp, clear
hist lexp
graph export lexp_stcolor.png, replace
That’s my default. So lets see what it looks like on virtually all versions of Stata before 18:
. webuse lifeexp, clear
(Life expectancy, 1998)
. di c(scheme)
stcolor
. hist lexp, scheme(s2color)
(bin=8, start=54, width=3.125)
. di c(scheme)
stcolor
. graph export lexp_s2color.png, replace
file /Users/d/Dropbox (Personal)/1f.ἡἔρις,κ/1.ontology/summer/lexp_s2color.png saved as PNG format
.
I think the above output clarifies what the c
in c(scheme)
refers to: a constant default, even when I change the output parameter.
So lets reset my default:
set scheme s2color
webuse lifeexp
di c(scheme)
hist lexp
graph export lexp_setscheme.png, replace
. set scheme s2color
. webuse lifeexp
(Life expectancy, 1998)
. di c(scheme)
s2color
. hist lexp
(bin=8, start=54, width=3.125)
. graph export lexp_setscheme.png, replace
file /Users/d/Dropbox (Personal)/1f.ἡἔρις,κ/1.ontology/summer/lexp_setscheme.png saved as PNG
format
.
How may one restore the default c(scheme)
?
So lets see if that works:
set scheme default
di c(N)
. set scheme default
scheme default not found
r(111);
That did not work. And after a Google search (chatGPT hasn’t yet been trained on data from a post Stata18 world!), I found nothing.
So I’ll just invoke my prior knowledge:
set scheme stcolor
di c(scheme)
. set scheme stcolor
. di c(scheme)
stcolor
.
5.1 histogram#
Univariable: distribution
use transplants, clear
hist bmi
graph export bmi.png, replace
. use transplants, clear
. hist bmi
(bin=32, start=17, width=.71875)
. graph export bmi.png, replace
file /Users/d/Dropbox (Personal)/1f.ἡἔρις,κ/1.ontology/summer/bmi.png saved as PNG format
.
32 bars
First bar is BMI 17-17.71875
Each one represents 0.71875 BMI units
5.1.1 density#
hist bmi, width(2)
graph export bmi2.png, replace
. hist bmi, width(2)
(bin=12, start=17, width=2)
. graph export bmi2.png, replace
file /Users/d/Dropbox (Personal)/1f.ἡἔρις,κ/1.ontology/summer/bmi2.png saved as PNG format
.
12 bars
First bar is BMI 17-19
Each on repesents 2 BMI units
hist bmi, bin(500) start(0)
graph export bmi_bin500.png, replace
. hist bmi, bin(500) start(0)
(bin=500, start=0, width=.08)
. graph export bmi_bin500.png, replace
file /Users/d/Dropbox (Personal)/1f.ἡἔρις,κ/1.ontology/summer/bmi_bin500.png saved as PNG format
.
hist bmi, width(2) start(0)
graph export bmi3.png, replace
. hist bmi, width(2) start(0)
(bin=20, start=0, width=2)
. graph export bmi3.png, replace
file /Users/d/Dropbox (Personal)/1f.ἡἔρις,κ/1.ontology/summer/bmi3.png saved as PNG format
.
20 bars
First bar is BMI 0-2
Each one represents 2 units
use transplants, clear
hist bmi, bin(10)
graph export bmi_bin10.png, replace
. use transplants, clear
. hist bmi, bin(10)
(bin=10, start=17, width=2.3)
. graph export bmi_bin10.png, replace
file /Users/d/Dropbox (Personal)/1f.ἡἔρις,κ/1.ontology/summer/bmi_bin10.png saved as PNG format
.
end of do-file
.
10 bars
Four flavors of the histogram
command:
density (default)
fraction
percent
frequency
use transplants, clear
hist age, addplot(kdensity age)
graph export hist_kdensity.png, replace
. use transplants, clear
. hist age, addplot(kdensity age)
(bin=33, start=0, width=2.5757576)
. graph export hist_kdensity.png, replace
file /Users/d/Dropbox (Personal)/1f.ἡἔρις,κ/1.ontology/summer/hist_kdensity.png saved as PNG format
.
5.1.2 fraction#
hist rec_wgt_kkg, fraction
. hist rec_wgt_kg, fraction
(bin=32, start=9.67, width=4.5596875)
. graph export weight.png, replace
file /Users/d/Dropbox (Personal)/1f.ἡἔρις,κ/1.ontology/summer/weight.png saved as PNG format
.
5.1.3 percent#
hist rec_wgt_kg, percent
graph export weight2.png, replace
. hist rec_wgt_kg, percent
(bin=32, start=9.67, width=4.5596875)
. graph export weight2.png, replace
file /Users/d/Dropbox (Personal)/1f.ἡἔρις,κ/1.ontology/summer/weight2.png saved as PNG format
.
5.1.4 frequency#
hist rec_wgt_kg, freq
graph export weight3.png, replace
. hist rec_wgt_kg, freq
(bin=32, start=9.67, width=4.5596875)
. graph export weight3.png, replace
file /Users/d/Dropbox (Personal)/1f.ἡἔρις,κ/1.ontology/summer/weight3.png saved as PNG format
.
end
5.1.5 discrete#
hist dx
graph export discrete.png, replace
. hist dx
(bin=33, start=1, width=.24242424)
. graph export discrete.png, replace
file /Users/d/Dropbox (Personal)/1f.ἡἔρις,κ/1.ontology/summer/discrete.png saved as PNG format
0.242424 diagnoses?
meaningless
option to adapt output to discrete variable
hist dx, disc
graph export discrete2.png, replace
. hist dx, disc
(start=1, width=1)
. graph export discrete2.png, replace
file /Users/d/Dropbox (Personal)/1f.ἡἔρις,κ/1.ontology/summer/discrete2.png saved as PNG format
.
end
5.1.6 addplot#
hist height if gender==0, addplot(hist height if gender ==1)
graph export addplot.png, replace
hist rec_hgt_cm if gender==0, fcolor(midblue%50) ///
addplot(hist rec_hgt_cm if gender ==1, fcolor(orange%40))
graph export addplot2.png, replace
hist rec_hgt_cm if gender==0, ///
fcolor(midblue%50) ///
legend( ///
lab(1 "Male") ///
lab(2 "Female")) ///
addplot(hist rec_hgt_cm if gender ==1, fcolor(orange%40))
graph export addplot3.png, replace
5.1.7 scheme#
hist rec_hgt_cm, scheme(s2color)
graph export scheme.png, replace
5.1.8 normal#
hist rec_hgt_cm, normal
graph export overlay.png, replace
5.2 twoway#
Bivariable: correlation
5.2.1 scatter#
use donors, clear
graph twoway scatter don_wgt don_hgt
graph export twowway.png, replace
. use donors, clear
. graph twoway scatter don_wgt don_hgt
. graph export twowway.png, replace
file /Users/d/Dropbox (Personal)/1f.ἡἔρις,κ/1.ontology/summer/twowway.png saved as PNG format
.
The twoway
plot may be a simple descriptive visualization of data. But what is implied during this exploratory phase of analysis is the following regression:
\(Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_N X_N \pm \varepsilon_i \)
So if you don’t have any predictor, then you’re left with just \(Y = \beta_0 \pm \varepsilon\), which includes all one-way plots such as histogram, boxplot, or a twoway plot of that one variable against a “meaningless” x-axis:
use transplants, clear
g x=1
qui sum age
return list
g b0=r(mean)
g ub=r(mean)+r(sd)*1.96
g lb=r(mean)-r(sd)*1.96
twoway (scatter age x, ///
jitter(5) ///
xscale(off) ///
mcolor(lime%5) ///
) ///
(scatter b0 x, ///
msize(2) ///
mcolor(midblue%80) ///
) ///
(rcap ub lb x, ///
legend(off) ///
lc(orange%80) ///
yti("Age at Transplant, y", orientation(horizontal)) ///
note("Mean & 95%CI", size(3)) ///
)
graph export age_m_95ci.png, replace
5.2.2 line#
“ecd” = “extended criteria donor” = donor age \(\gt 60\), or donor age \(50-59\) with certain comorbidities
use transplants, clear
bys age: egen mean_ecd = mean(don_ecd)
egen age_tag = tag(age)
graph twoway line mean_ecd age if age_tag==1
graph export twoway_egen.png, replace
How has the number of standard criteria donor (SCD) and extended criteria donor (ECD) transplants changed over time?
SCD
= donor age < 50 or (age 50-59 and donor had at most one of: hypertension, death by cerebrovascular accident, terminal serum creatinine > 1.5 mg/dL)
ECD
= donor age >= 60 or (age 50-59 and donor had at least two of: hypertension, death by cerebrovascular accident, terminal serum creatinine > 1.5 mg/dL)
Let’s graph the number of SCD and ECD transplants over time
use tx_yr.dta, clear
desc
. use tx_yr.dta, clear
. desc
Contains data from tx_yr.dta
Observations: 10
Variables: 13 2 Jul 2021 08:07
Variable Storage Display Value
name type format label Variable label
yr int %8.0g Transplant Year
not_working double %9.0g Num. Unemployed Recipients
n double %8.0g (sum) n
hypertensive double %9.0g (sum) hypertensive
unknown_disease double %9.0g (sum) unknown_disease
diabetes double %9.0g (sum) diabetes
ecd double %4.0g (sum) ecd
female double %8.0g (sum) female
rec_hcv_antib~y double %9.0g Number HCV+ Recipients
over70 double %9.0g Number Recips. Over 70 Years Old
male int %8.0g
scd int %8.0g
total float %9.0g Total Num. of Recipients
Sorted by: yr
tx_yr.dta
has data on the frequency of specific transplant types per year. For example, the variable female is the number of female transplant recipients per year. The variable scd is the number of standard criteria donor transplants performed per year.
graph twoway line n yr
graph export line_n_yr.png, replace
graph twoway line ecd scd yr
graph export line_n_yr2.png, replace
5.2.3 connected#
graph twoway connected n yr
graph export connected.png, replace
5.2.4 area#
graph twoway area n yr
graph export area.png, replace
5.2.5 bar#
graph twoway bar n yr
graph export bar.png, replace
5.2.6 function#
graph twoway function y=x^2+2
graph export function.png
5.3 y#
Several y variables
graph twoway area ecd scd yr
graph export nyvar.png, replace
Change order
graph twoway area scd ecd yr
graph export nyvar2.png, replace
Order matters: the one listed first gets graphed first. Additional areas might overlie the first one.
graph twoway bar scd ecd yr
graph export nyvar3.png, replace
Interpretation is tricky. In 2010, were there about 130 SCD transplants (top of red to top of blue) or 170 (x- axis to top of blue)?
5.4 overlay#
twoway line n yr || connected male female yr
graph export line_connect.png, replace
regress n yr
twoway line n yr ///
|| function y=_b[_cons]+_b[yr]*x, range(yr)
graph export line_regress.png, replace
twoway line female yr ///
|| line male yr ///
|| line scd yr ///
|| line ecd yr ///
|| line n yr
graph export five_overlay.png, replace
5.5 axis#
The graph of # transplants per year exaggerates year- on-year change; the value for 2009 appears to be near zero, but is actually 180. How can we fix the axis?
5.5.1 scale()#
twoway line n yr, yscale(range(0))
graph export scale.png, replace
Make sure the y axis range includes the number zero
twoway line n yr, yscale(range(0 400))
graph export scale2.png, replace
Make sure the y axis range includes the numbers 0 and 400
tw li n yr, xscale(range(2014))
graph export range.png, replace
tw li ecd yr, xscale(off) yscale(off)
graph export scale_off.png, replace
5.6 program#
capture program drop figure1
qui program define figure1
syntax varlist, title(str) xtitle(str) ylt(str)
local y_axis_var: di word("`varlist'", 1)
local x_axis_var: di word("`varlist'", 2)
local xti: di "`xtitle'"
local yline_text: di "`ylt'"
qui sum `y_axis_var', d
g p50=r(p50)
g p25=r(p25)
g p75=r(p75)
local m_age=r(mean)
local m_age_text=r(mean) + 2
qui sum `x_axis_var'
local m_xaxis=r(mean)
#delimit ;
twoway (scatter p50 abo in 1/100)(rcap p25 p75 abo in 1/100,
ti("`title'")
xti("`xti'")
text(`m_age_text' `m_xaxis' "`yline_text'")
yline(`m_age',
lc(lime)
)
yti("`y_axis_var'",
orientation(horizontal)
)
ylab(10(10)80)
xlab(
1 "A"
2 "B"
3 "AB"
4 "O"
)
legend(off
lab(1 "Median")
lab(2 "1st Quartile")
lab(3 "4th Quartile")
order(3 1 2)
)
note("Median & IQR")
)
;
#delimit cr
end
use transplants, clear
figure1 age abo, title("Age at Transplant") xtitle("ABO Blood Group") ylt("Mean Age")
graph export figure1.png, replace
This is a modified program with twoway rcap
to draw the interquartile range. I’ve also suppressed the legend since the output is modified. Hope you can notice the difference from the one in the video. But you can still find a copy of the original program below.
The original program developed together in class:
capture program drop figure1
qui program define figure1
syntax varlist, title(str) xtitle(str) ylt(str)
local y_axis_var: di word("`varlist'", 1)
local x_axis_var: di word("`varlist'", 2)
local xti: di "`xtitle'"
local yline_text: di "`ylt'"
qui sum `y_axis_var', d
g p50=r(p50)
g p25=r(p25)
g p75=r(p75)
local m_age=r(mean) + 2
qui sum `m_axis_var'
local m_xaxis=r(mean)
#delimit ;
twoway (scatter p50 abo in 1/100)(rcap p25 p75 abo in 1/100,
ti("`title'")
xti("`xti'")
text(`m_xaxis' `m_age' "`yline_text'")
yline(`m_age',
lc(lime)
)
yti("`y_axis_var'",
orientation(horizontal)
)
ylab(10(10)80)
xlab(
1 "A"
2 "B"
3 "AB"
4 "O"
)
legend(off
lab(1 "Median")
lab(2 "1st Quartile")
lab(3 "4th Quartile")
order(3 1 2)
)
note("Median & IQR")
)
;
#delimit cr
end