Hello ,
I would appreciate if someone can advise the difference in the 2 merging techniques below:
I recieve different counts using the data step merge vs the proc sql and i need to understand why?
Technique 1
proc sql;
create table NACRSFY19_int_AT as
select distinct *
from NACRSFY19_int as a inner join nacrs19.Anaesthetic_technique as b
on a.am_care_key=b.am_care_key;
quit;
/*5,829,296*/
Technique 2
proc sort data= NACRSFY19_int;
by am_care_key;
run;
proc sort data= nacrs19.Anaesthetic_technique out=AT_sorted;
by am_care_key;
run;
/*16,943,476*/
data nacrs_AT_19;
merge NACRSFY19_int (in=a) AT_sorted (in=b);
by am_care_key;
if a and b;
run;
/*3,237,261*/
Hi @Baba9,
Have you checked the log to see if that gives any clues.
If there is anything in the log you need help to understand then please post its contents using the "</>" icon when posting.
Thanks & kind regards,
Amir.
proc sql;
633 create table NACRSFY19_int_AT as
634 select distinct *
635 from NACRSFY19_int as a inner join nacrs19.Anaesthetic_technique as b
636 on a.am_care_key=b.am_care_key;
WARNING: Variable AM_CARE_KEY already exists on file WORK.NACRSFY19_INT_AT.
NOTE: SAS threaded sort was used.
NOTE: Table WORK.NACRSFY19_INT_AT created, with 5829296 rows and 10 columns.
637 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 5.57 seconds
user cpu time 4.96 seconds
system cpu time 2.89 seconds
memory 968258.75k
OS Memory 999016.00k
Timestamp 2021-07-05 12:33:11 PM
Step Count 102 Switch Count 0
638 data NACRSFY19_int2 ;
639 merge nacrs_AT_19 (in=a) Excl_combined (in=b);
640 by am_care_key;
641 if a and not b;
642 run;
INFO: The variable CCI_INTERV_CODE on data set WORK.NACRS_AT_19 will be overwritten by data set
WORK.EXCL_COMBINED.
INFO: The variable INTERV_SEQ_ID on data set WORK.NACRS_AT_19 will be overwritten by data set
WORK.EXCL_COMBINED.
INFO: The variable STATUS_ATTRIBUTE_CODE on data set WORK.NACRS_AT_19 will be overwritten by data
set WORK.EXCL_COMBINED.
INFO: The variable INTERV_OOH_IND_CODE on data set WORK.NACRS_AT_19 will be overwritten by data
set WORK.EXCL_COMBINED.
NOTE: There were 3237261 observations read from the data set WORK.NACRS_AT_19.
NOTE: There were 37200 observations read from the data set WORK.EXCL_COMBINED.
NOTE: The data set WORK.NACRSFY19_INT2 has 3205688 observations and 21 variables.
NOTE: DATA statement used (Total process time):
real time 1.34 seconds
user cpu time 0.92 seconds
system cpu time 0.28 seconds
memory 1080.90k
OS Memory 33016.00k
Timestamp 2021-07-05 12:36:14 PM
Step Count 103 Switch Count 0
Hi @ pasted the log
Thankyou
Your log confirms that you are doing a many-to-many merge which SQL can do but a DATA step merge cannot. Refer to @Reeza 's explanation.
These are two different joins.
NACRSFY19_int, nacrs19.Anaesthetic_technique
in the SQL,
nacrs_AT_19, Excl_combined
in the data step.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.