hi,
I have two datasets (one and two) which contains only one column Borrower_ID2. I create dataset three and merge ONLY when the ID from dataset one is exactly the same with those dataset two.
The ID columns looks like as below :
J70451589 |
J70604221 |
J70553258 |
J70202511 |
84F3G |
THC2G |
GUNES |
AMHZN |
i formatted the IDs to the same format and then used the MERGE and a=b (see below). However, only common IDs with J-numbers (eg. J70451589) merged in dataset three even thought there are many common in format number-character (eg. 84F3G).
This is dataset three :
J70451589 |
J70604221 |
J70553258 |
J70202511 |
but i would expect something like :
J70451589 |
J70604221 |
J70553258 |
J70202511 |
84F3G |
THC2G |
/* My datasets */; data one; set one_raw; keep Borrower_ID2; format Borrower_ID2 $9.; run; data two; set two_raw; keep Borrower_ID2; format Borrower_ID2 $9.; run; /*Use Merge*/; proc sort data= one; by Borrower_ID2; run; proc sort data= two; by Borrower_ID2; run; DATA three; MERGE one (in=a) two (in=b); BY Borrower_ID2; if a and b; run;
I also run the proc contents :
You may need to check if one, or possibly both, of the data sets have leading spaces. Many ways of displaying data, Proc Print for example, will left justify things so that the leading spaces don't display.
You don't indicate how big your data set is but running this (removing the match requirement)
DATA three; MERGE one (in=a) two (in=b); BY Borrower_ID2; /*if a and b;*/ run;
If you find the suspect ID values on multiple lines not together that is an indicator of likely issues. Another possibility is differences in case, 84F3G vs 84f3g
I might suggest to remove possible spaces. If you see case differences then make all the values upper or lower case, your choice.
data one; set one_raw; keep Borrower_ID2; Borrower_ID2= strip(Borrower_ID2); format Borrower_ID2 $9.; run; data two; set two_raw; keep Borrower_ID2; Borrower_ID2= strip(Borrower_ID2); format Borrower_ID2 $9.; run;
Note: The format assigned to a variable doesn't make any difference for this sort of comparison. Equals is equals.
Lengths might have an affect depending on the content of the values. Consider:
data example1; x='ABCDEF '; format x $25.; run; data example2; x='ABCDEF '; format x $15.; run; data examplemerge; merge example1 example2; by x; run;
Different defined lengths with the values for the first few characters the same, different formats, but the merge works as desired (in this limited and contrived case) though the log does show a warning about the different lengths because there is a possibility of unexpected results and which data set controls the resulting format.
You may need to check if one, or possibly both, of the data sets have leading spaces. Many ways of displaying data, Proc Print for example, will left justify things so that the leading spaces don't display.
You don't indicate how big your data set is but running this (removing the match requirement)
DATA three; MERGE one (in=a) two (in=b); BY Borrower_ID2; /*if a and b;*/ run;
If you find the suspect ID values on multiple lines not together that is an indicator of likely issues. Another possibility is differences in case, 84F3G vs 84f3g
I might suggest to remove possible spaces. If you see case differences then make all the values upper or lower case, your choice.
data one; set one_raw; keep Borrower_ID2; Borrower_ID2= strip(Borrower_ID2); format Borrower_ID2 $9.; run; data two; set two_raw; keep Borrower_ID2; Borrower_ID2= strip(Borrower_ID2); format Borrower_ID2 $9.; run;
Note: The format assigned to a variable doesn't make any difference for this sort of comparison. Equals is equals.
Lengths might have an affect depending on the content of the values. Consider:
data example1; x='ABCDEF '; format x $25.; run; data example2; x='ABCDEF '; format x $15.; run; data examplemerge; merge example1 example2; by x; run;
Different defined lengths with the values for the first few characters the same, different formats, but the merge works as desired (in this limited and contrived case) though the log does show a warning about the different lengths because there is a possibility of unexpected results and which data set controls the resulting format.
hi, thank you for your support i run it with the strip and works. It seems i had some spaces in my ID column. The strange think is that i had used compress for the same reason but did not work.
Borrower_ID2= strip(Borrower_ID2);
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.