BookmarkSubscribeRSS Feed
bebaioun
Calcite | Level 5

Hi,

 

I am trying to match merge two large data sets having nine common variables (v1-v9) with 11 variables in total (random values put in as example):

--------------------------------------------------

Dataset1:

 

v1        v2        v3        v4        v5        v6        v7        v8        v9        v_a        v_b

 1        10         1         24         5         12        15        21        3        11           12

 1        15         1         27         9         13        17        21        1        11           12

--------------------------------------------------

Dataset2:

 

v1        v2        v3        v4        v5        v6        v7        v8        v9        v_c        v_d

 a        10         1         24         5         12        15        21        3        11           12

 a        15         1         27         9         13        17        21        1        11           12

--------------------------------------------------

After merging, I would like my output in the format below:

 

Merged_Dataset:

 

v1        v2        v3        v4        v5        v6        v7        v8        v9       v_a         v_b        v_c        v_d

 

-------------------------------------------------------

 

Could you please help?, the code I used is as below:

data Merged_Dataset;

merge Dataset1

      Dataset2 (in = in2);

      by v1 v2 v3 v4 v5 v6 v7 v8 v9 ;

      if in2;

run;

 

I would like to see if there are any duplicates in the observations that have common values for the 9 variables that are of interest. But I am not sure how to do this after matching.

 

Any valuable feedback would be appreciated. Thanks!

 

4 REPLIES 4
ScottBass
Rhodochrosite | Level 12

Edit your original post with self-contained data steps.  Don't make us do your work by forcing us to convert your post into usable code. 


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
bebaioun
Calcite | Level 5

Oh sorry,  it is my first time posting (as I am a new SAS user)! 

Will work on the format.

PGStats
Opal | Level 21

For the merge operation to work, you need:

 

  1. variables v1-v9 to be of matching types in both datasets and
  2. both datasets to be sorted by v1-v9.

Once you meet these conditions, you can detect key duplicates as you merge the datasets with:

 

data Merged_Dataset;

merge Dataset1 Dataset2;
by v1 v2 v3 v4 v5 v6 v7 v8 v9;

if not (first.v9 and last.v9) then put "Duplicate" _all_;
run;

(untested)

PG
bebaioun
Calcite | Level 5

Thanks so much! I will try it on the data.

Really appreciate your feedback!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1135 views
  • 0 likes
  • 3 in conversation