BookmarkSubscribeRSS Feed
bebaioun
Calcite | Level 5

Hi,

 

I am trying to match merge two large data sets having nine common variables (v1-v9) with 11 variables in total (random values put in as example):

--------------------------------------------------

Dataset1:

 

v1        v2        v3        v4        v5        v6        v7        v8        v9        v_a        v_b

 1        10         1         24         5         12        15        21        3        11           12

 1        15         1         27         9         13        17        21        1        11           12

--------------------------------------------------

Dataset2:

 

v1        v2        v3        v4        v5        v6        v7        v8        v9        v_c        v_d

 a        10         1         24         5         12        15        21        3        11           12

 a        15         1         27         9         13        17        21        1        11           12

--------------------------------------------------

After merging, I would like my output in the format below:

 

Merged_Dataset:

 

v1        v2        v3        v4        v5        v6        v7        v8        v9       v_a         v_b        v_c        v_d

 

-------------------------------------------------------

 

Could you please help?, the code I used is as below:

data Merged_Dataset;

merge Dataset1

      Dataset2 (in = in2);

      by v1 v2 v3 v4 v5 v6 v7 v8 v9 ;

      if in2;

run;

 

I would like to see if there are any duplicates in the observations that have common values for the 9 variables that are of interest. But I am not sure how to do this after matching.

 

Any valuable feedback would be appreciated. Thanks!

 

4 REPLIES 4
ScottBass
Rhodochrosite | Level 12

Edit your original post with self-contained data steps.  Don't make us do your work by forcing us to convert your post into usable code. 


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
bebaioun
Calcite | Level 5

Oh sorry,  it is my first time posting (as I am a new SAS user)! 

Will work on the format.

PGStats
Opal | Level 21

For the merge operation to work, you need:

 

  1. variables v1-v9 to be of matching types in both datasets and
  2. both datasets to be sorted by v1-v9.

Once you meet these conditions, you can detect key duplicates as you merge the datasets with:

 

data Merged_Dataset;

merge Dataset1 Dataset2;
by v1 v2 v3 v4 v5 v6 v7 v8 v9;

if not (first.v9 and last.v9) then put "Duplicate" _all_;
run;

(untested)

PG
bebaioun
Calcite | Level 5

Thanks so much! I will try it on the data.

Really appreciate your feedback!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 547 views
  • 0 likes
  • 3 in conversation