Re: Match merge by multiple variables

bebaioun · Posted 05-06-2019 09:25 PM

Hi,

I am trying to match merge two large data sets having nine common variables (v1-v9) with 11 variables in total (random values put in as example):

--------------------------------------------------

Dataset1:

v1 v2 v3 v4 v5 v6 v7 v8 v9 v_a v_b

1 10 1 24 5 12 15 21 3 11 12

1 15 1 27 9 13 17 21 1 11 12

--------------------------------------------------

Dataset2:

v1 v2 v3 v4 v5 v6 v7 v8 v9 v_c v_d

a 10 1 24 5 12 15 21 3 11 12

a 15 1 27 9 13 17 21 1 11 12

--------------------------------------------------

After merging, I would like my output in the format below:

Merged_Dataset:

v1 v2 v3 v4 v5 v6 v7 v8 v9 v_a v_b v_c v_d

-------------------------------------------------------

Could you please help?, the code I used is as below:

data Merged_Dataset;

merge Dataset1

      Dataset2 (in = in2);

      by v1 v2 v3 v4 v5 v6 v7 v8 v9 ;

      if in2;

run;

I would like to see if there are any duplicates in the observations that have common values for the 9 variables that are of interest. But I am not sure how to do this after matching.

Any valuable feedback would be appreciated. Thanks!

ScottBass · Posted 05-06-2019 10:17 PM

Edit your original post with self-contained data steps. Don't make us do your work by forcing us to convert your post into usable code.

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

bebaioun · Posted 05-06-2019 11:32 PM

Oh sorry, it is my first time posting (as I am a new SAS user)!

Will work on the format.

PGStats · Posted 05-06-2019 11:27 PM

For the merge operation to work, you need:

variables v1-v9 to be of matching types in both datasets and
both datasets to be sorted by v1-v9.

Once you meet these conditions, you can detect key duplicates as you merge the datasets with:

data Merged_Dataset;

merge Dataset1 Dataset2;
by v1 v2 v3 v4 v5 v6 v7 v8 v9;

if not (first.v9 and last.v9) then put "Duplicate" _all_;
run;

(untested)

PG

bebaioun · Posted 05-06-2019 11:36 PM

Thanks so much! I will try it on the data.

Really appreciate your feedback!

Match merge by multiple variables