Merge two data sets in Stata

For a one-to-many or many-to-one match merge, use merge 1:m or merge m:1 ; see Merge two data sets in the many-to-one relationship in Stata.

To merge two data sets in Stata, first sort each data set on the key variables upon which the merging will be based. Then, use the merge command followed by a list of key variable(s) and data set(s). In Stata version 11 and later:

merge 1:1 varlist using filename [, options]

Note:

If you're using Stata version 10 or older, omit the 1:1 specification. Observations in each data set should be unique in the one-to-one match merge.

Suppose you have two key variables id and name in two data sets stat and math. The following code sorts and saves the stat data set and then sorts the math data set. Then, while the math data set is still in memory, it merges (using the stat data set) on the key variables id and name:

use stat.dta, clear
sort id name
save stat.dta, replace

use math.dta, clear
sort id name

merge 1:1 id name using stat.dta

If two data sets share variables besides the key variables, use the ,update option to replace missing values in the master file (in memory) with corresponding non-missing values in the secondary file. Use ,update replace to replace non-missing values in the master file with corresponding non-missing values in the secondary file.

To use the drop-down menu in Stata version 11 and later, select Data > Combine Datasets > Merge Two Datasets.

If you have questions about using statistical and mathematical software at Indiana University, contact the UITS Research Applications and Deep Learning team.