ARCHIVED: How do I merge two data files in R?
Suppose you have two data files, dataset1
and
dataset2
, that need to be merged into a single
data set. First, read both data files in R. Then, use the
merge()
function to join the two data sets based on a
unique id variable that is common to both data sets:
> merged.data <- merge(dataset1, dataset2, by="countryID")
merged.data
is an R object, which contains the two merged
data sets. The data files were joined based on the id variable
countryID
.
It is possible to merge data files by more than one id variable:
> merged.data <- merge(dataset1, dataset2, by=c("regionID", "countryID"))
It is also possible to merge the two files if the unique id variable
has a different name in each data set. For example, the id variable may
be called countryID
in dataset1
, but called
stateID
in dataset2
:
> merged.data <- merge(dataset1, dataset2, by.x="countryID", by.y="stateID")
In this case, by.x
calls the name of the id variable in dataset1
, and by.y
calls the name of the id variable in dataset2
.
Note: The default setting of the merge()
function drops all unmatched cases. If you want to keep all cases in
the new data set, include the option all=TRUE
in the
merge()
function:
> merged.data.all <- merge(dataset1, dataset2, by="countryID", all=TRUE)
To keep unmatched cases only from dataset1
, use the
all.x
option. Conversely, to keep unmatched cases only
from dataset2
, use the all.y
option:
> merged.data.all <- merge(dataset1, dataset2, by="countryID", all.x=TRUE) > merged.data.all <- merge(dataset1, dataset2, by="countryID", all.y=TRUE)
When all.x=TRUE
, an extra row will be added to the output
for each case in dataset1
that has no matching cases in
dataset2
. Cases that do not have values from
dataset2
will be labeled as missing. Conversely, when
all.y=TRUE
, an extra row will be added to the output for
each case in dataset2
that has no matching cases in
dataset1
. Cases that do not have values from
dataset1
will be labeled as missing.
If you have questions about using statistical and mathematical software at Indiana University, contact the UITS Research Applications and Deep Learning team.
This is document azux in the Knowledge Base.
Last modified on 2023-05-09 14:38:24.