ARCHIVED: In SAS, how can I randomly assign half the cases to one group and the remaining half to another?

This content has been archived, and is no longer maintained by Indiana University. Information here may no longer be accurate, and links may no longer be available or reliable.

In SAS, you may wish to assign half the cases to one group and the remaining half to another. Suppose you have a data set of 100 observations. You wish to randomly select 50 cases and run an analysis, and then use the other 50 cases to run the same analysis to compare the two results. Use the following SAS syntax:

  data random1(drop=k n) random2(drop=k n);
  retain k 50 n 100;
  set original;
  if ranuni(76543)<=k/n then
   do;
    output random1;
    k=k-1;
   end;
  else
   do;
    output random2;
   end;
  n=n-1;
  run;

In the above example, original is the name of your SAS data set.

The resulting SAS data sets, random1 and random2, are complementary, and each includes 50 observations.

Notice that the number 76543 in the example above is a seed for the random number generator. You can use any integer less than 231 as a seed, and a different seed will yield a different set of division of your data set. If the seed is less than or equal to 0, the time of day is used instead.

The SURVEYSELECT procedure

As an alternative method, you can also use the SURVEYSELECT procedure, as in the following SAS syntax:

proc surveyselect data=original out=split samprate=.5 outall;
run;

data random1 random2;
set split;
if selected = 1 then
output random1;
else output random2;
run;

If you have questions about using statistical and mathematical software at Indiana University, contact the UITS Research Applications and Deep Learning team.

This is document aehl in the Knowledge Base.
Last modified on 2023-05-09 14:36:48.