BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
constliv
Obsidian | Level 7

I am running the below code on SAS EG 5.1 .

***START preparing data for merging;
data _1_ACRONYM_HUBIDS_FOUND ;
	set ACRONYM_HUBIDS_FOUND ;
	customerno = LEFT(UPCASE(GROUP_NO_FOR_ACRONYM)) ; /*GROUP_NO_FOR_ACRONYM is copied as new variable customerno be the 'merge by' variable*/
run ;

data _1_cust_groups_FINAL ;
set cust_groups_FINAL ;
customerno = LEFT(UPCASE(customerno)) ;
run ;

proc sort data=_1_ACRONYM_HUBIDS_FOUND ; by customerno ; run;
proc sort data=_1_cust_groups_FINAL; by customerno; run ;

proc sql noprint ;
%let keeplist=;
select name into :keeplist separated by ' '
from dictionary.columns
where libname='WORK'
and memname= UPCASE('_1_ACRONYM_HUBIDS_FOUND') ;
;
quit;

OPTIONS MERGENOBY=WARN MSGLEVEL=I;

DATA ACRONYM_HUBIDS_with_custgroups (keep=&keeplist rank customergroup)
	acronyms_not_in_top50 (keep=&keeplist rank customerno customergroup) /*dataset to identify RAY customers not in top 50 customergroups*/
	EXCEPTION_1;

/*format statement to order variables in output datasets*/
format 	&keeplist
		rank customergroup
;

MERGE _1_ACRONYM_HUBIDS_FOUND(IN=In1) _1_cust_groups_FINAL (IN=In2);
BY customerno ;

IF (In1=1 and In2=1) then output ACRONYM_HUBIDS_with_custgroups ;
IF (In1 = 1 and  In2=0 ) then output acronyms_not_in_top50 ;
RUN;

My issue is with the results of the merge as reported where there seems to 1 record appearing in both output files of the data step merge when the output criteria (in=) are mutually exclusive.

 

Here's the extract from the log:

NOTE: There were 3894 observations read from the data set WORK._1_ACRONYM_HUBIDS_FOUND.

NOTE: There were 1374 observations read from the data set WORK._1_CUST_GROUPS_FINAL.

NOTE: The data set WORK.ACRONYM_HUBIDS_WITH_CUSTGROUPS has 3261 observations and 11 variables.

NOTE: The data set WORK.ACRONYMS_NOT_IN_TOP50 has 634 observations and 11 variables.

NOTE: The data set WORK.EXCEPTION_1 has 0 observations and 11 variables.

 

3261 + 634 = 3895 , but the "left join" left dataset WORK._1_ACRONYM_HUBIDS_FOUND had only 3894 observations.

 

PROC CONTENTS of both files used in the merge are provided as PDF attachment to this post.

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Without see your data we can only guess. My guess would be that you have a duplicate record for one customer in your _1_cust_groups_FINAL dataset.

 

For example, the following code results in the unexpected number you encountered:

DATA _1_ACRONYM_HUBIDS_FOUND;
  input customerno;
  cards;
1
2
3
4
;

DATA _1_cust_groups_FINAL;
  input customerno;
  cards;
1
3
3
4
;

DATA ACRONYM_HUBIDS_with_custgroups
	acronyms_not_in_top50
	EXCEPTION_1;
  MERGE _1_ACRONYM_HUBIDS_FOUND(IN=In1) _1_cust_groups_FINAL (IN=In2);
  BY customerno ;

  IF (In1=1 and In2=1) then output ACRONYM_HUBIDS_with_custgroups ;
  IF (In1 = 1 and  In2=0 ) then output acronyms_not_in_top50 ;
RUN;

Also, you create a file called EXCEPTION_1, but don't include a statement to output any records to it.

 

Art, CEO, AnalystFinder.com

 

 

View solution in original post

4 REPLIES 4
RW9
Diamond | Level 26 RW9
Diamond | Level 26

If you provide test data which illustrates the issue, then it would be easier to help.  I can suggest using a proc sql join to simplify the code:

proc sql;
  create table ACRONYM_HUBIDS_with_custgroups as 
  select A.*,
         B.....
  from   _1_ACRONYM_HUBIDS_FOUND A
  inner join _1_CUST_GROUPS_FINAL B
  on     A.CUSTOMERNO=B.CUSTOMER_NO;
  create table ACRONYMS_NOT_IN_TOP50 as 
  select ...
  from   ...
  where  CUSTOMERNO not in (select CUSTOMERNO from _1_CUST_GROUPS_FINAL);
quit;
art297
Opal | Level 21

Without see your data we can only guess. My guess would be that you have a duplicate record for one customer in your _1_cust_groups_FINAL dataset.

 

For example, the following code results in the unexpected number you encountered:

DATA _1_ACRONYM_HUBIDS_FOUND;
  input customerno;
  cards;
1
2
3
4
;

DATA _1_cust_groups_FINAL;
  input customerno;
  cards;
1
3
3
4
;

DATA ACRONYM_HUBIDS_with_custgroups
	acronyms_not_in_top50
	EXCEPTION_1;
  MERGE _1_ACRONYM_HUBIDS_FOUND(IN=In1) _1_cust_groups_FINAL (IN=In2);
  BY customerno ;

  IF (In1=1 and In2=1) then output ACRONYM_HUBIDS_with_custgroups ;
  IF (In1 = 1 and  In2=0 ) then output acronyms_not_in_top50 ;
RUN;

Also, you create a file called EXCEPTION_1, but don't include a statement to output any records to it.

 

Art, CEO, AnalystFinder.com

 

 

constliv
Obsidian | Level 7

Art,

 

You guessed right! Indeed there are 2 records with same customerno in my dataset _1_cust_groups_FINAL !

 

However, I am curious to know why this did not get reported in the log with a message such as 'multiple values for BY variable" (Log provided as attachment).

 

 

 

art297
Opal | Level 21

You didn't get a note because multiple records for a by variable in one data set is quite normal in a merge. The note is only included when you have multiple records for a by variable in both datasets.

 

Art, CEO, AnalystFinder.com

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1403 views
  • 0 likes
  • 3 in conversation