ARCHIVED: In SAS, how do I update one data set with another?

This content has been archived, and is no longer maintained by Indiana University. Information here may no longer be accurate, and links may no longer be available or reliable.

In SAS, you can use either the MERGE statement or the UPDATE statement in a DATA step to update the values of observations in a master data set. Both statements should be followed by the BY statement, which specifies the primary key (variable) sorted by the SORT procedure.

The MERGE statement automatically replaces all existing values with new values, whether they are missing or not. In contrast, the UPDATE statement updates values only when corresponding data are not missing. The default option, UPDATEMODE=MISSINGCHECK, prevents missing values in a transaction data set from replacing values in a master data set. The UPDATEMODE=NOMISSINGCHECK option is similar to the MERGE statement in that all values in the master data set are overwritten by corresponding data in the transaction data set. Consequently, the UPDATE statement is more flexible than the MERGE statement.

Suppose you have two data sets, master and transact. The master data set has four variables, id, name, gender, and weight, as follows:

  01 Perry  M 165 
  02 Miller M 145 
  03 Davis  F 127

The transact data set has five variables: id, name, gender, weight, and height. Missing values are indicated by . (a period) in this data set:

  02 Miller . 160 5.9
  03 Bush   . 157 .
  05 Elliot F 125 5.2
  02 .      M 170 .

Now, sort the data by id:

  PROC SORT data=master; BY id; RUN;
  PROC SORT data=transact; BY id; RUN;

The following example uses the UPDATE statement:

  DATA new_master1;
  UPDATE master transact;
  BY id;
  RUN;

The output would appear as follows:

  01 Perry  M 165 .
  02 Miller M 170 5.9
  03 Bush   F 157 .
  05 Elliot F 125 5.2

The new variables (height) and observation (05 Elliot) in the transaction data set that do not exist in the master data set will be added to the output data set, new_master1. If a transaction data set has multiple observations with the same primary identification key, they will be sequentially updated.

The following example uses the MERGE statement:

  DATA new_master2;
  MERGE master transact;
  BY id;
  RUN;

The output is given below:

  01 Perry  M 165 .
  02 Miller . 160 5.9
  02 .      M 170 .
  03 Bush   . 157 .
  05 Elliot F 125 5.2

Note that Miller and Bush look different from those in the above example. The last observation of the transact data set was added, although it has the same id (02) as Miller.

If you have questions about using statistical and mathematical software at Indiana University, contact the UITS Research Applications and Deep Learning team.