In Stata, how can I randomly select a certain number of observations from a data set?
In Stata, the .sample command selects random
samples of the data set in memory and removes unselected observations
from the data set.
Suppose you want to randomly draw a sample of 100 observations from
the current data set. First, load a data set, and then run the
following command with the count option:
If you want to take a sample of 20% from the current data set, drop
the count as follows:
If you want to take a sample that maintains the same proportion of
each group, use the by() option. The following command
selects 20% observations within male (male=1) and
female (male=0) groups.
.sample draws a sample without replacement. If you want
to allow replacement, use the .bsample command instead.
For more about statistical and mathematical software, email the UITS Stat/Math Center, visit the center's web page, or phone 812-855-4724 (IUB) or 317-278-4740 (IUPUI). The center is located in Bloomington at 410 N. Park Avenue, and is open for consultation by appointment Monday-Friday 9am-5pm.
Last modified on May 04, 2011.







