In SAS, how can I randomly select a certain number of observations from a dataset?
Since SAS version 8.0, you can use the SURVEYSELECT
procedure for random sampling. The procedure supports various methods
for selecting probability-based random samples from the existing data
set. The SURVEYSELECT procedure can conduct simple
(SRS), unrestricted (URS), systematic
(SYS), and sequential (SEQ) random sampling
methods. It also supports the probability-proportional-to-size
(PPS) method.
Suppose you want to randomly draw 100 observations from the data set
pop with 7,000 observations. Consider the following SAS code:
The METHOD=SRS option specifies the simple random
sampling method. The SEED option specifies the seed to be
used in the random number generation, allowing replication of the same
set of random numbers. The 100 observations drawn are stored in the
data set sample.
If you have to use SAS 6.12, which does not have the
SURVEYSELECT procedure, you must write SAS code to
randomly select observations. The following example generates random
numbers from a uniform probability distribution using the
UNIFORM() function:
In the above SAS 6.12 code, the probability of an observation being selected is not the same across observations. The probability depends on the order of observations and the seed value. Hence, this approach is not recommended as a random sampling method in a strict statistical sense.
For more about statistical and mathematical software, email the UITS Stat/Math Center, visit the center's web page, or phone 812-855-4724 (IUB) or 317-278-4740 (IUPUI). The center is located in Bloomington at 410 N. Park Avenue, and is open for consultation by appointment Monday-Friday 9am-5pm.
Last modified on January 27, 2011.







