ARCHIVED: In Stata, how can I randomly select a certain number of observations from a data set?

This content has been archived, and is no longer maintained by Indiana University. Information here may no longer be accurate, and links may no longer be available or reliable.

In Stata, the .sample command selects random samples of the data set in memory and removes unselected observations from the data set.

Suppose you want to randomly draw a sample of 100 observations from the current data set. First, load a data set, and then run the following command with the count option:

  . sample 100, count

If you want to take a sample of 20% from the current data set, drop the count as follows:

  . sample 20

If you want to take a sample that maintains the same proportion of each group, use the by() option. The following command selects 20% observations within male (male=1) and female (male=0) groups.

  . sample 20, by(male)

If you want to take a sample that draws randomly from only one specific group and keeps all observations in other groups, use the if command. The following command selects 20% observations within the male (male=1) group, while keeping all females (non-males) in the data set:

 .sample 20 if male == 1 

.sample draws a sample without replacement. If you want to allow replacement, use the .bsample command instead.

If you have questions about using statistical and mathematical software at Indiana University, contact the UITS Research Applications and Deep Learning team.

This is document awja in the Knowledge Base.
Last modified on 2023-05-09 14:45:14.