Day 1: Overview, Commands, and Syntax#
Pre-Class Survey#
Before beginning our exploration of Stata, we conducted an entry survey to assess the class composition and individual needs. This survey helps instructors tailor the curriculum to match student backgrounds and technical requirements.
Technical Setup Assessment#
Students were asked to identify their computing environment:
1. Stata Access Method (June 19-23, 2023)
Locally on laptop
Remotely on desktop or terminal
2. Operating System
MacOSX
Unix
Windows
3. Statistical Software Experience Level (0-5 scale)
0 - No Experience: No prior experience with Stata or other statistical software (SAS, R, Python)
1 - Basic Knowledge: General understanding of basic commands but requires assistance
2 - Novice User: Familiar with basic commands, can import data and perform basic cleaning, needs guidance for complex analyses
3 - Competent User: Proficient in data exploration, descriptive statistics, basic inference (t-tests, chi-square), and regression
4 - Advanced User: Can perform multivariable regression and understands various statistical modeling techniques
5 - Expert User: Can write custom programs, macros, ado-files in Stata, or expert in other statistical software with minimal Stata experience
1.1 Overview#
“Ceci n’est pas une crabe” - This is not a crab
Understanding Stata requires distinguishing between two fundamental perspectives:
System vs. User Framework#
System Perspective:
Native Components: Core Stata application, support files, and built-in
.ado
filesExample:
which help
returns/Applications/Stata/ado/base/h/help.ado
Third-party Components: External
.ado
files from the communityExample:
which table1_afecdvi
returns/Applications/Stata/ado/base/t/table1_afecdvi.ado
User-created Components: Your custom
.ado
files and programsUninstalled programs return:
command table1_afecdv not found as either built-in or ado-file
User Perspective:
Known Users: Instructors, teaching assistants, students, collaborators
Unknown Users: Future users of your code requiring empathy, sharing, and care in code design
Key Principles for Code Development#
The system is simply “Stata” (not “STATA” - it’s not an acronym). When developing code, consider your dual role as both user and system contributor. This requires:
Empathy: Anticipating user needs
Sharing: Making code accessible (e.g., GitHub)
Care: Creating user-friendly, well-annotated code
Installation Environments#
Local Installation:
MacOSX
Unix
Windows
Remote Access:
Desktop (Windows)
Cluster (Unix/Terminal)
1.2 Commands#
Stata Interface Elements#
Menu System (for local installations):
file | edit | view | data | graphics | statistics | user | window | help
Window Shortcuts:
Command: ⌘1
Results: ⌘2
History: ⌘3
Variables: ⌘4
Properties: ⌘5
Graph: ⌘6
Viewer: ⌘7
Editor: ⌘8
Do-file: ⌘9
Manager: ⌘10
Understanding Commands#
A command is the first valid word in any Stata instruction. Commands have visual indicators:
Native commands: Rendered in blue (built-in Stata commands)
Third-party commands: Appear in white/black (require installation warnings for collaborators)
Basic Data Import Example#
webuse lifeexp, clear
. webuse lifeexp, clear
(Life expectancy, 1998)
.
Exploring Data Structure#
display c(N)
display c(k)
describe
. display c(N)
68
. display c(k)
6
. describe
Contains data from https://www.stata-press.com/data/r18/lifeexp.dta
Observations: 68 Life expectancy, 1998
Variables: 6 26 Mar 2022 09:40
(_dta has notes)
Variable Storage Display Value
name type format label Variable label
region byte %16.0g region Region
country str28 %28s Country
popgrowth float %9.0g * Avg. annual % growth
lexp byte %9.0g * Life expectancy at birth
gnppc float %9.0g * GNP per capita
safewater byte %9.0g * Safe water
* indicated variables have notes
Sorted by:
.
Understanding Display Command#
The display
command shows strings and scalar expressions. Using help display
reveals:
[P] display -- Display strings and values of scalar expressions
Output values have two key properties:
Name: (e.g.,
c(N)
andc(k)
)Content: (number of observations and variables)
These are system-defined macros (user-defined macros covered tomorrow).
Data Visualization Example#
webuse lifeexp, clear
encode country, gen(Country)
twoway scatter lexp Country, xscale(off)
graph export lexp_bycountry.png, replace
. webuse lifeexp, clear
(Life expectancy, 1998)
. encode country, gen(Country)
. twoway scatter lexp Country, xscale(off)
. graph export lexp_bycountry.png, replace
file /Users/d/Desktop/lexp_bycountry.png saved as PNG format
.
1.3 Syntax#
Standalone vs. Context-Dependent Commands#
Some commands work independently:
chelp
pwd
Most commands require additional syntax for functionality.
Data Generation and Analysis Example#
clear
set obs 1000
generate bmi=rnormal(28,5)
histogram bmi, normal
graph export bmi.png, replace
. clear
. set obs 1000
Number of observations (_N) was 0, now 1,000.
. generate bmi=rnormal(28,5)
. histogram bmi, normal
(bin=29, start=13.457429, width=.99248317)
. graph export bmi.png, replace
file /Users/d/Desktop/bmi.png saved as PNG format
.
end of do-file
.
Syntax Components Breakdown#
In this example:
clear
: Command (standalone)set obs 1000
: Command + syntax (creates empty dataset with 1000 observations)generate bmi=rnormal(28,5)
: Command + syntax (creates variable from normal distribution, μ=28, σ=5)histogram bmi, normal
: Command + syntax (creates histogram with normal overlay)graph export bmi.png, replace
: Command + syntax (saves figure as PNG)
Command Recognition and Error Handling#
Valid commands appear colored (blue/purple). Unrecognized commands trigger errors:
myfirstprogram
. myfirstprogram
command myfirstprogram is unrecognized
r(199);
.
Visual Cues for Command Types#
Blue/Purple: Native Stata commands
White/Black: Third-party programs or syntax elements
Error messages: Unrecognized commands
Input and Output Framework#
Input Types#
Menu-driven: GUI interactions (output commands to results window)
Do files: Stata scripts with command sequences
Ado files: Stata programs for specific tasks
Output Categories#
String Output:
Text:
str49
= string of 49 characters including spacesExample: “The median age in this population is 40 years old”
URL:
https://www.stata-press.com/data/r8
Filepath:
/users/d/desktop
Numeric Output (by range):
Integer types:
byte
: -127 to 100int
: -32,767 to 32,740long
: ±2 billion
Decimal types:
float
: ±10³⁸ billiondouble
: ±10³⁰⁷ billion
Learning Objectives Summary#
By the end of today’s session, students should understand:
System-User Framework: Distinguishing between Stata components and user roles
Command Structure: Recognizing native vs. third-party commands
Basic Syntax: Understanding command-syntax relationships
Data Types: String and numeric output categories
Error Recognition: Identifying and troubleshooting unrecognized commands
Tomorrow’s Preview: User-defined macros and writing your first Stata program Wednesday’s Preview: Adding syntax requirements and user-defined input