In Stata, how do I merge two data sets?

For a one-to-many or many-to-one match merge, use .merge 1:m or .merge m:1 ; see In Stata, how do I merge two data sets in the many-to-one relationship?

To merge two data sets in Stata, first sort each data set on the key variables upon which the merging will be based. Then, use the .merge command followed by a list of key variable(s) and data set(s). In Stata version 11 and later:

merge 1:1 varlist using filename [, options]

Note:
If you're using Stata version 10 or older, omit the 1:1 specification. Observations in each data set should be unique in the one-to-one match merge.

Suppose we have two key variables id and name in two data sets stat and math. The following code sorts and saves the stat data set and then sorts the math data set. Then, while the math data set is still in memory, it merges (using the stat data set) on the key variables id and name:

use stat.dta, clear
sort id name
save stat.dta, replace

use math.dta, clear
sort id name

merge 1:1 id name using stat.dta

If two data sets share variables besides the key variables, use the ,update option to replace missing values in the master file (in memory) with corresponding non-missing values in the secondary file. Use ,update replace to replace non-missing values in the master file with corresponding non-missing values in the secondary file.

To use the drop-down menu in Stata version 11 and later:

Data > Combine Datasets > Merge Two Datasets

If you have questions about using statistical and mathematical software at Indiana University, contact Research Analytics. Research Analytics is located on the IU Bloomington campus at Woodburn Hall 200; staff are available for consultation Monday-Friday 9am-noon and by appointment.

This is document azck in the Knowledge Base.
Last modified on 2015-06-26 00:00:00.

Contact us

For help or to comment, email the UITS Support Center.