BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
constliv
Obsidian | Level 7

I am running the below code on SAS EG 5.1 .

***START preparing data for merging;
data _1_ACRONYM_HUBIDS_FOUND ;
	set ACRONYM_HUBIDS_FOUND ;
	customerno = LEFT(UPCASE(GROUP_NO_FOR_ACRONYM)) ; /*GROUP_NO_FOR_ACRONYM is copied as new variable customerno be the 'merge by' variable*/
run ;

data _1_cust_groups_FINAL ;
set cust_groups_FINAL ;
customerno = LEFT(UPCASE(customerno)) ;
run ;

proc sort data=_1_ACRONYM_HUBIDS_FOUND ; by customerno ; run;
proc sort data=_1_cust_groups_FINAL; by customerno; run ;

proc sql noprint ;
%let keeplist=;
select name into :keeplist separated by ' '
from dictionary.columns
where libname='WORK'
and memname= UPCASE('_1_ACRONYM_HUBIDS_FOUND') ;
;
quit;

OPTIONS MERGENOBY=WARN MSGLEVEL=I;

DATA ACRONYM_HUBIDS_with_custgroups (keep=&keeplist rank customergroup)
	acronyms_not_in_top50 (keep=&keeplist rank customerno customergroup) /*dataset to identify RAY customers not in top 50 customergroups*/
	EXCEPTION_1;

/*format statement to order variables in output datasets*/
format 	&keeplist
		rank customergroup
;

MERGE _1_ACRONYM_HUBIDS_FOUND(IN=In1) _1_cust_groups_FINAL (IN=In2);
BY customerno ;

IF (In1=1 and In2=1) then output ACRONYM_HUBIDS_with_custgroups ;
IF (In1 = 1 and  In2=0 ) then output acronyms_not_in_top50 ;
RUN;

My issue is with the results of the merge as reported where there seems to 1 record appearing in both output files of the data step merge when the output criteria (in=) are mutually exclusive.

 

Here's the extract from the log:

NOTE: There were 3894 observations read from the data set WORK._1_ACRONYM_HUBIDS_FOUND.

NOTE: There were 1374 observations read from the data set WORK._1_CUST_GROUPS_FINAL.

NOTE: The data set WORK.ACRONYM_HUBIDS_WITH_CUSTGROUPS has 3261 observations and 11 variables.

NOTE: The data set WORK.ACRONYMS_NOT_IN_TOP50 has 634 observations and 11 variables.

NOTE: The data set WORK.EXCEPTION_1 has 0 observations and 11 variables.

 

3261 + 634 = 3895 , but the "left join" left dataset WORK._1_ACRONYM_HUBIDS_FOUND had only 3894 observations.

 

PROC CONTENTS of both files used in the merge are provided as PDF attachment to this post.

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Without see your data we can only guess. My guess would be that you have a duplicate record for one customer in your _1_cust_groups_FINAL dataset.

 

For example, the following code results in the unexpected number you encountered:

DATA _1_ACRONYM_HUBIDS_FOUND;
  input customerno;
  cards;
1
2
3
4
;

DATA _1_cust_groups_FINAL;
  input customerno;
  cards;
1
3
3
4
;

DATA ACRONYM_HUBIDS_with_custgroups
	acronyms_not_in_top50
	EXCEPTION_1;
  MERGE _1_ACRONYM_HUBIDS_FOUND(IN=In1) _1_cust_groups_FINAL (IN=In2);
  BY customerno ;

  IF (In1=1 and In2=1) then output ACRONYM_HUBIDS_with_custgroups ;
  IF (In1 = 1 and  In2=0 ) then output acronyms_not_in_top50 ;
RUN;

Also, you create a file called EXCEPTION_1, but don't include a statement to output any records to it.

 

Art, CEO, AnalystFinder.com

 

 

View solution in original post

4 REPLIES 4
RW9
Diamond | Level 26 RW9
Diamond | Level 26

If you provide test data which illustrates the issue, then it would be easier to help.  I can suggest using a proc sql join to simplify the code:

proc sql;
  create table ACRONYM_HUBIDS_with_custgroups as 
  select A.*,
         B.....
  from   _1_ACRONYM_HUBIDS_FOUND A
  inner join _1_CUST_GROUPS_FINAL B
  on     A.CUSTOMERNO=B.CUSTOMER_NO;
  create table ACRONYMS_NOT_IN_TOP50 as 
  select ...
  from   ...
  where  CUSTOMERNO not in (select CUSTOMERNO from _1_CUST_GROUPS_FINAL);
quit;
art297
Opal | Level 21

Without see your data we can only guess. My guess would be that you have a duplicate record for one customer in your _1_cust_groups_FINAL dataset.

 

For example, the following code results in the unexpected number you encountered:

DATA _1_ACRONYM_HUBIDS_FOUND;
  input customerno;
  cards;
1
2
3
4
;

DATA _1_cust_groups_FINAL;
  input customerno;
  cards;
1
3
3
4
;

DATA ACRONYM_HUBIDS_with_custgroups
	acronyms_not_in_top50
	EXCEPTION_1;
  MERGE _1_ACRONYM_HUBIDS_FOUND(IN=In1) _1_cust_groups_FINAL (IN=In2);
  BY customerno ;

  IF (In1=1 and In2=1) then output ACRONYM_HUBIDS_with_custgroups ;
  IF (In1 = 1 and  In2=0 ) then output acronyms_not_in_top50 ;
RUN;

Also, you create a file called EXCEPTION_1, but don't include a statement to output any records to it.

 

Art, CEO, AnalystFinder.com

 

 

constliv
Obsidian | Level 7

Art,

 

You guessed right! Indeed there are 2 records with same customerno in my dataset _1_cust_groups_FINAL !

 

However, I am curious to know why this did not get reported in the log with a message such as 'multiple values for BY variable" (Log provided as attachment).

 

 

 

art297
Opal | Level 21

You didn't get a note because multiple records for a by variable in one data set is quite normal in a merge. The note is only included when you have multiple records for a by variable in both datasets.

 

Art, CEO, AnalystFinder.com

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1875 views
  • 0 likes
  • 3 in conversation