ARCHIVED: In SAS, how do I update one data set with another?
In SAS, you can use either the MERGE
statement or the
UPDATE
statement in a DATA
step to update
the values of observations in a master data set. Both statements
should be followed by the BY
statement, which specifies
the primary key (variable) sorted by the SORT
procedure.
The MERGE
statement automatically replaces all existing
values with new values, whether they are missing or not. In contrast,
the UPDATE
statement updates values only when
corresponding data are not missing. The default option,
UPDATEMODE=MISSINGCHECK
, prevents missing values in a
transaction data set from replacing values in a master data set. The
UPDATEMODE=NOMISSINGCHECK
option is similar to the
MERGE
statement in that all values in the master data
set are overwritten by corresponding data in the transaction data
set. Consequently, the UPDATE
statement is more flexible
than the MERGE
statement.
Suppose you have two data sets, master
and
transact
. The master
data set has four
variables, id
, name
,
gender
, and weight
, as follows:
01 Perry M 165 02 Miller M 145 03 Davis F 127
The transact
data set has five variables:
id
, name
, gender
,
weight
, and height
. Missing
values are indicated by .
(a period) in this
data set:
02 Miller . 160 5.9 03 Bush . 157 . 05 Elliot F 125 5.2 02 . M 170 .
Now, sort the data by id
:
PROC SORT data=master; BY id; RUN; PROC SORT data=transact; BY id; RUN;
The following example uses the UPDATE
statement:
DATA new_master1; UPDATE master transact; BY id; RUN;
The output would appear as follows:
01 Perry M 165 . 02 Miller M 170 5.9 03 Bush F 157 . 05 Elliot F 125 5.2
The new variables (height
) and observation (05
Elliot
) in the transaction data set that do not exist in the
master data set will be added to the output data set,
new_master1
. If a transaction data set has multiple
observations with the same primary identification key, they will be
sequentially updated.
The following example uses the MERGE
statement:
DATA new_master2; MERGE master transact; BY id; RUN;
The output is given below:
01 Perry M 165 . 02 Miller . 160 5.9 02 . M 170 . 03 Bush . 157 . 05 Elliot F 125 5.2
Note that Miller
and Bush
look different
from those in the above example. The last observation of the
transact
data set was added, although it has the same id
(02
) as Miller
.
If you have questions about using statistical and mathematical software at Indiana University, contact the UITS Research Applications and Deep Learning team.
This is document aljg in the Knowledge Base.
Last modified on 2023-05-09 14:43:13.