Merge two data sets in the many-to-one relationship in Stata

Note:

The merge command supports 1:1, 1:m, m:1, and m:m match merges. For 1:1, see Merge two data sets in Stata.

Suppose you have two data sets, A.dta and B.dta (see below), which share the same key variable, id :

Data set A.dta :

+---------+
| id    y |
|---------|
|  1   85 |
|  1   47 |
|  2   95 |
|  4   36 |
|  5   83 |
+---------+

Data set B.dta :

+---------+
| id    x |
|---------|
|  1   53 |
|  2   23 |
|  3   45 |
+---------+

To merge these two data sets, follow the appropriate instructions below.

Stata 11 and later versions

Sort by key variable(s) first, and then enter the merge command, making sure the data set with the "many" observations is the current data set in memory (for m:1 merges). An example using the above data sets follows:

use "C:\temp\B.dta", clear
sort id
save "C:\temp\B.dta", replace

use "C:\temp\A.dta", clear
sort id

merge m:1 id using "C:\temp\B.dta"

Stata 10 and earlier versions

Use the joinby command:

use "C:\temp\A.dta", clear
sort id

joinby id using "C:\temp\B.dta", unmatched(both)

Results

In both versions, the merge command creates a variable _merge indicating the results. For example, the designation (3) in the column on the right means that the observation appears in both data sets and that they're matched:

+--------------------------------+
| id    y    x            _merge |
|--------------------------------|
|  1   85   53       matched (3) |
|  1   47   53       matched (3) |
|  2   95   23       matched (3) |
|  4   36    .   master only (1) |
|  5   83    .   master only (1) |
|  3    .   45    using only (2) |
+--------------------------------+

If you have questions about using statistical and mathematical software at Indiana University, contact the UITS Research Applications and Deep Learning team.