I am running the below code on SAS EG 5.1 .
***START preparing data for merging;
data _1_ACRONYM_HUBIDS_FOUND ;
set ACRONYM_HUBIDS_FOUND ;
customerno = LEFT(UPCASE(GROUP_NO_FOR_ACRONYM)) ; /*GROUP_NO_FOR_ACRONYM is copied as new variable customerno be the 'merge by' variable*/
run ;
data _1_cust_groups_FINAL ;
set cust_groups_FINAL ;
customerno = LEFT(UPCASE(customerno)) ;
run ;
proc sort data=_1_ACRONYM_HUBIDS_FOUND ; by customerno ; run;
proc sort data=_1_cust_groups_FINAL; by customerno; run ;
proc sql noprint ;
%let keeplist=;
select name into :keeplist separated by ' '
from dictionary.columns
where libname='WORK'
and memname= UPCASE('_1_ACRONYM_HUBIDS_FOUND') ;
;
quit;
OPTIONS MERGENOBY=WARN MSGLEVEL=I;
DATA ACRONYM_HUBIDS_with_custgroups (keep=&keeplist rank customergroup)
acronyms_not_in_top50 (keep=&keeplist rank customerno customergroup) /*dataset to identify RAY customers not in top 50 customergroups*/
EXCEPTION_1;
/*format statement to order variables in output datasets*/
format &keeplist
rank customergroup
;
MERGE _1_ACRONYM_HUBIDS_FOUND(IN=In1) _1_cust_groups_FINAL (IN=In2);
BY customerno ;
IF (In1=1 and In2=1) then output ACRONYM_HUBIDS_with_custgroups ;
IF (In1 = 1 and In2=0 ) then output acronyms_not_in_top50 ;
RUN;
My issue is with the results of the merge as reported where there seems to 1 record appearing in both output files of the data step merge when the output criteria (in=) are mutually exclusive.
Here's the extract from the log:
NOTE: There were 3894 observations read from the data set WORK._1_ACRONYM_HUBIDS_FOUND.
NOTE: There were 1374 observations read from the data set WORK._1_CUST_GROUPS_FINAL.
NOTE: The data set WORK.ACRONYM_HUBIDS_WITH_CUSTGROUPS has 3261 observations and 11 variables.
NOTE: The data set WORK.ACRONYMS_NOT_IN_TOP50 has 634 observations and 11 variables.
NOTE: The data set WORK.EXCEPTION_1 has 0 observations and 11 variables.
3261 + 634 = 3895 , but the "left join" left dataset WORK._1_ACRONYM_HUBIDS_FOUND had only 3894 observations.
PROC CONTENTS of both files used in the merge are provided as PDF attachment to this post.
Without see your data we can only guess. My guess would be that you have a duplicate record for one customer in your _1_cust_groups_FINAL dataset.
For example, the following code results in the unexpected number you encountered:
DATA _1_ACRONYM_HUBIDS_FOUND; input customerno; cards; 1 2 3 4 ; DATA _1_cust_groups_FINAL; input customerno; cards; 1 3 3 4 ; DATA ACRONYM_HUBIDS_with_custgroups acronyms_not_in_top50 EXCEPTION_1; MERGE _1_ACRONYM_HUBIDS_FOUND(IN=In1) _1_cust_groups_FINAL (IN=In2); BY customerno ; IF (In1=1 and In2=1) then output ACRONYM_HUBIDS_with_custgroups ; IF (In1 = 1 and In2=0 ) then output acronyms_not_in_top50 ; RUN;
Also, you create a file called EXCEPTION_1, but don't include a statement to output any records to it.
Art, CEO, AnalystFinder.com
If you provide test data which illustrates the issue, then it would be easier to help. I can suggest using a proc sql join to simplify the code:
proc sql; create table ACRONYM_HUBIDS_with_custgroups as select A.*, B..... from _1_ACRONYM_HUBIDS_FOUND A inner join _1_CUST_GROUPS_FINAL B on A.CUSTOMERNO=B.CUSTOMER_NO; create table ACRONYMS_NOT_IN_TOP50 as select ... from ... where CUSTOMERNO not in (select CUSTOMERNO from _1_CUST_GROUPS_FINAL); quit;
Without see your data we can only guess. My guess would be that you have a duplicate record for one customer in your _1_cust_groups_FINAL dataset.
For example, the following code results in the unexpected number you encountered:
DATA _1_ACRONYM_HUBIDS_FOUND; input customerno; cards; 1 2 3 4 ; DATA _1_cust_groups_FINAL; input customerno; cards; 1 3 3 4 ; DATA ACRONYM_HUBIDS_with_custgroups acronyms_not_in_top50 EXCEPTION_1; MERGE _1_ACRONYM_HUBIDS_FOUND(IN=In1) _1_cust_groups_FINAL (IN=In2); BY customerno ; IF (In1=1 and In2=1) then output ACRONYM_HUBIDS_with_custgroups ; IF (In1 = 1 and In2=0 ) then output acronyms_not_in_top50 ; RUN;
Also, you create a file called EXCEPTION_1, but don't include a statement to output any records to it.
Art, CEO, AnalystFinder.com
Art,
You guessed right! Indeed there are 2 records with same customerno in my dataset _1_cust_groups_FINAL !
However, I am curious to know why this did not get reported in the log with a message such as 'multiple values for BY variable" (Log provided as attachment).
You didn't get a note because multiple records for a by variable in one data set is quite normal in a merge. The note is only included when you have multiple records for a by variable in both datasets.
Art, CEO, AnalystFinder.com
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.