Hi,
I am trying to match merge two large data sets having nine common variables (v1-v9) with 11 variables in total (random values put in as example):
--------------------------------------------------
Dataset1:
v1 v2 v3 v4 v5 v6 v7 v8 v9 v_a v_b
1 10 1 24 5 12 15 21 3 11 12
1 15 1 27 9 13 17 21 1 11 12
--------------------------------------------------
Dataset2:
v1 v2 v3 v4 v5 v6 v7 v8 v9 v_c v_d
a 10 1 24 5 12 15 21 3 11 12
a 15 1 27 9 13 17 21 1 11 12
--------------------------------------------------
After merging, I would like my output in the format below:
Merged_Dataset:
v1 v2 v3 v4 v5 v6 v7 v8 v9 v_a v_b v_c v_d
-------------------------------------------------------
Could you please help?, the code I used is as below:
data Merged_Dataset; merge Dataset1 Dataset2 (in = in2); by v1 v2 v3 v4 v5 v6 v7 v8 v9 ; if in2; run;
I would like to see if there are any duplicates in the observations that have common values for the 9 variables that are of interest. But I am not sure how to do this after matching.
Any valuable feedback would be appreciated. Thanks!
Edit your original post with self-contained data steps. Don't make us do your work by forcing us to convert your post into usable code.
Oh sorry, it is my first time posting (as I am a new SAS user)!
Will work on the format.
For the merge operation to work, you need:
Once you meet these conditions, you can detect key duplicates as you merge the datasets with:
data Merged_Dataset;
merge Dataset1 Dataset2;
by v1 v2 v3 v4 v5 v6 v7 v8 v9;
if not (first.v9 and last.v9) then put "Duplicate" _all_;
run;
(untested)
Thanks so much! I will try it on the data.
Really appreciate your feedback!
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.