ARCHIVED: In Stata, how can I randomly select a certain number of observations from a data set?
In Stata, the .sample
command selects random
samples of the data set in memory and removes unselected observations
from the data set.
Suppose you want to randomly draw a sample of 100 observations from
the current data set. First, load a data set, and then run the
following command with the count
option:
. sample 100, count
If you want to take a sample of 20% from the current data set, drop
the count
as follows:
. sample 20
If you want to take a sample that maintains the same proportion of
each group, use the by()
option. The following command
selects 20% observations within male (male=1
) and
female (male=0
) groups.
. sample 20, by(male)
If you want to take a sample that draws randomly from only one
specific group and keeps all observations in other groups, use the
if
command. The following command selects 20%
observations within the male (male=1
) group, while
keeping all females (non-males) in the data set:
.sample 20 if male == 1
.sample
draws a sample without replacement. If you want
to allow replacement, use the .bsample
command instead.
If you have questions about using statistical and mathematical software at Indiana University, contact the UITS Research Applications and Deep Learning team.
Related documents
This is document awja in the Knowledge Base.
Last modified on 2023-05-09 14:45:14.