BookmarkSubscribeRSS Feed
Baba9
Obsidian | Level 7

Hello ,

I would appreciate if someone can advise the difference in the 2 merging techniques below:

I recieve different counts using the data step merge vs the proc sql and i need to understand why?

 

Technique 1 

proc sql;
create table NACRSFY19_int_AT as
select distinct *
from NACRSFY19_int as a inner join nacrs19.Anaesthetic_technique as b
on a.am_care_key=b.am_care_key;
quit;
/*5,829,296*/



Technique 2
proc sort data= NACRSFY19_int;
by am_care_key;
run;


proc sort data= nacrs19.Anaesthetic_technique out=AT_sorted;
by am_care_key;
run;
/*16,943,476*/



data nacrs_AT_19;
merge NACRSFY19_int (in=a) AT_sorted (in=b);
by am_care_key;
if a and b;
run;
/*3,237,261*/
7 REPLIES 7
Amir
PROC Star

Hi @Baba9,

 

Have you checked the log to see if that gives any clues.

 

If there is anything in the log you need help to understand then please post its contents using the "</>" icon when posting.

 

 

Thanks & kind regards,

Amir.

Reeza
Super User
Is AM_CARE_KEY unique in each data set? I suspect not...
Baba9
Obsidian | Level 7
It’s not unique
Baba9
Obsidian | Level 7
 proc sql;
633   create table NACRSFY19_int_AT as
634   select distinct *
635   from NACRSFY19_int as a inner join nacrs19.Anaesthetic_technique as b
636   on a.am_care_key=b.am_care_key;
WARNING: Variable AM_CARE_KEY already exists on file WORK.NACRSFY19_INT_AT.
NOTE: SAS threaded sort was used.
NOTE: Table WORK.NACRSFY19_INT_AT created, with 5829296 rows and 10 columns.

637   quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           5.57 seconds
      user cpu time       4.96 seconds
      system cpu time     2.89 seconds
      memory              968258.75k
      OS Memory           999016.00k
      Timestamp           2021-07-05 12:33:11 PM
      Step Count                        102  Switch Count  0


638   data NACRSFY19_int2 ;
639   merge nacrs_AT_19 (in=a) Excl_combined (in=b);
640   by am_care_key;
641   if a and not b;
642   run;

INFO: The variable CCI_INTERV_CODE on data set WORK.NACRS_AT_19 will be overwritten by data set
      WORK.EXCL_COMBINED.
INFO: The variable INTERV_SEQ_ID on data set WORK.NACRS_AT_19 will be overwritten by data set
      WORK.EXCL_COMBINED.
INFO: The variable STATUS_ATTRIBUTE_CODE on data set WORK.NACRS_AT_19 will be overwritten by data
      set WORK.EXCL_COMBINED.
INFO: The variable INTERV_OOH_IND_CODE on data set WORK.NACRS_AT_19 will be overwritten by data
      set WORK.EXCL_COMBINED.
NOTE: There were 3237261 observations read from the data set WORK.NACRS_AT_19.
NOTE: There were 37200 observations read from the data set WORK.EXCL_COMBINED.
NOTE: The data set WORK.NACRSFY19_INT2 has 3205688 observations and 21 variables.
NOTE: DATA statement used (Total process time):
      real time           1.34 seconds
      user cpu time       0.92 seconds
      system cpu time     0.28 seconds
      memory              1080.90k
      OS Memory           33016.00k
      Timestamp           2021-07-05 12:36:14 PM
      Step Count                        103  Switch Count  0

Hi @ pasted the log 
Thankyou
SASKiwi
PROC Star

Your log confirms that you are doing a many-to-many merge which SQL can do but a DATA step merge cannot. Refer to @Reeza 's explanation.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1908 views
  • 3 likes
  • 5 in conversation