BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Toni2
Lapis Lazuli | Level 10

hi, 

I have two datasets (one and two) which contains only one column Borrower_ID2. I create dataset three and  merge ONLY when the ID from dataset one is exactly the same with those dataset two. 

 

The ID columns looks like as below : 

 

J70451589
J70604221
J70553258
J70202511
84F3G
THC2G
GUNES
AMHZN

 

i formatted the IDs to the same format and then used the MERGE and a=b (see below). However, only common IDs with J-numbers (eg. J70451589) merged in dataset three even thought there are many common in format number-character (eg. 84F3G).

 

This is dataset three : 

 

J70451589
J70604221
J70553258
J70202511

 

but i would expect something like :

 

J70451589
J70604221
J70553258
J70202511
84F3G
THC2G

 

 

/* My datasets */;

data one;
set one_raw;
keep Borrower_ID2;
format Borrower_ID2 $9.;
run;

data two;
set two_raw;
keep Borrower_ID2;
format Borrower_ID2 $9.;
run;

/*Use Merge*/;

proc sort data= one;
by Borrower_ID2;
run;

proc sort data= two;
by Borrower_ID2;
run;

DATA three;
MERGE one (in=a) two (in=b);
BY Borrower_ID2;
if a and b;
run;




I also run the proc contents :  

 

Toni2_0-1633540369699.png

Toni2_1-1633540502394.png

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

You may need to check if one, or possibly both, of the data sets have leading spaces. Many ways of displaying data, Proc Print for example, will left justify things so that the leading spaces don't display.

 

You don't indicate how big your data set is but running this (removing the match requirement)

DATA three;
MERGE one (in=a) two (in=b);
BY Borrower_ID2;
/*if a and b;*/
run;

If you find the suspect ID values on multiple lines not together that is an indicator of likely issues. Another possibility is differences in case, 84F3G vs 84f3g

 

I might suggest to remove possible spaces. If you see case differences then make all the values upper or lower case, your choice.

data one;
set one_raw;
keep Borrower_ID2;
Borrower_ID2= strip(Borrower_ID2);
format Borrower_ID2 $9.;
run;

data two;
set two_raw;
keep Borrower_ID2;
Borrower_ID2= strip(Borrower_ID2);
format Borrower_ID2 $9.;
run;

Note: The format assigned to a variable doesn't make any difference for this sort of comparison. Equals is equals.

Lengths might have an affect depending on the content of the values. Consider:

data example1;
   x='ABCDEF ';
   format x $25.;
run;

data example2;
   x='ABCDEF        ';
   format x $15.;
run;

data examplemerge;
   merge example1 example2;
   by x;
run;

Different defined lengths with the values for the first few characters the same, different formats, but the merge works as desired (in this limited and contrived case) though the log does show a warning about the different lengths because there is a possibility of unexpected results and which data set controls the resulting format.

 

View solution in original post

2 REPLIES 2
ballardw
Super User

You may need to check if one, or possibly both, of the data sets have leading spaces. Many ways of displaying data, Proc Print for example, will left justify things so that the leading spaces don't display.

 

You don't indicate how big your data set is but running this (removing the match requirement)

DATA three;
MERGE one (in=a) two (in=b);
BY Borrower_ID2;
/*if a and b;*/
run;

If you find the suspect ID values on multiple lines not together that is an indicator of likely issues. Another possibility is differences in case, 84F3G vs 84f3g

 

I might suggest to remove possible spaces. If you see case differences then make all the values upper or lower case, your choice.

data one;
set one_raw;
keep Borrower_ID2;
Borrower_ID2= strip(Borrower_ID2);
format Borrower_ID2 $9.;
run;

data two;
set two_raw;
keep Borrower_ID2;
Borrower_ID2= strip(Borrower_ID2);
format Borrower_ID2 $9.;
run;

Note: The format assigned to a variable doesn't make any difference for this sort of comparison. Equals is equals.

Lengths might have an affect depending on the content of the values. Consider:

data example1;
   x='ABCDEF ';
   format x $25.;
run;

data example2;
   x='ABCDEF        ';
   format x $15.;
run;

data examplemerge;
   merge example1 example2;
   by x;
run;

Different defined lengths with the values for the first few characters the same, different formats, but the merge works as desired (in this limited and contrived case) though the log does show a warning about the different lengths because there is a possibility of unexpected results and which data set controls the resulting format.

 

Toni2
Lapis Lazuli | Level 10

hi, thank you for your support i run it with the strip and works. It seems i had some spaces in my ID column. The strange think is that i had used compress for the same reason but did not work. 

 

Borrower_ID2= strip(Borrower_ID2);

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1110 views
  • 0 likes
  • 2 in conversation