Solved: Merge with number-character ID

Toni2 · Posted 10-06-2021 01:38 PM

hi,

I have two datasets (one and two) which contains only one column Borrower_ID2. I create dataset three and merge ONLY when the ID from dataset one is exactly the same with those dataset two.

The ID columns looks like as below :

J70451589

J70604221

J70553258

J70202511

84F3G

THC2G

GUNES

AMHZN

i formatted the IDs to the same format and then used the MERGE and a=b (see below). However, only common IDs with J-numbers (eg. J70451589) merged in dataset three even thought there are many common in format number-character (eg. 84F3G).

This is dataset three :

J70451589

J70604221

J70553258

J70202511

but i would expect something like :

J70451589

J70604221

J70553258

J70202511

84F3G

THC2G

/* My datasets */;

data one;
set one_raw;
keep Borrower_ID2;
format Borrower_ID2 $9.;
run;

data two;
set two_raw;
keep Borrower_ID2;
format Borrower_ID2 $9.;
run;

/*Use Merge*/;

proc sort data= one;
by Borrower_ID2;
run;

proc sort data= two;
by Borrower_ID2;
run;

DATA three;
MERGE one (in=a) two (in=b);
BY Borrower_ID2;
if a and b;
run;

I also run the proc contents :

ballardw · Posted 10-06-2021 02:07 PM

You may need to check if one, or possibly both, of the data sets have leading spaces. Many ways of displaying data, Proc Print for example, will left justify things so that the leading spaces don't display.

You don't indicate how big your data set is but running this (removing the match requirement)

DATA three;
MERGE one (in=a) two (in=b);
BY Borrower_ID2;
/*if a and b;*/
run;

If you find the suspect ID values on multiple lines not together that is an indicator of likely issues. Another possibility is differences in case, 84F3G vs 84f3g

I might suggest to remove possible spaces. If you see case differences then make all the values upper or lower case, your choice.

data one;
set one_raw;
keep Borrower_ID2;
Borrower_ID2= strip(Borrower_ID2);
format Borrower_ID2 $9.;
run;

data two;
set two_raw;
keep Borrower_ID2;
Borrower_ID2= strip(Borrower_ID2);
format Borrower_ID2 $9.;
run;

Note: The format assigned to a variable doesn't make any difference for this sort of comparison. Equals is equals.

Lengths might have an affect depending on the content of the values. Consider:

data example1;
   x='ABCDEF ';
   format x $25.;
run;

data example2;
   x='ABCDEF        ';
   format x $15.;
run;

data examplemerge;
   merge example1 example2;
   by x;
run;

Different defined lengths with the values for the first few characters the same, different formats, but the merge works as desired (in this limited and contrived case) though the log does show a warning about the different lengths because there is a possibility of unexpected results and which data set controls the resulting format.

View solution in original post

ballardw · Posted 10-06-2021 02:07 PM

You may need to check if one, or possibly both, of the data sets have leading spaces. Many ways of displaying data, Proc Print for example, will left justify things so that the leading spaces don't display.

You don't indicate how big your data set is but running this (removing the match requirement)

DATA three;
MERGE one (in=a) two (in=b);
BY Borrower_ID2;
/*if a and b;*/
run;

If you find the suspect ID values on multiple lines not together that is an indicator of likely issues. Another possibility is differences in case, 84F3G vs 84f3g

I might suggest to remove possible spaces. If you see case differences then make all the values upper or lower case, your choice.

data one;
set one_raw;
keep Borrower_ID2;
Borrower_ID2= strip(Borrower_ID2);
format Borrower_ID2 $9.;
run;

data two;
set two_raw;
keep Borrower_ID2;
Borrower_ID2= strip(Borrower_ID2);
format Borrower_ID2 $9.;
run;

Note: The format assigned to a variable doesn't make any difference for this sort of comparison. Equals is equals.

Lengths might have an affect depending on the content of the values. Consider:

data example1;
   x='ABCDEF ';
   format x $25.;
run;

data example2;
   x='ABCDEF        ';
   format x $15.;
run;

data examplemerge;
   merge example1 example2;
   by x;
run;

Different defined lengths with the values for the first few characters the same, different formats, but the merge works as desired (in this limited and contrived case) though the log does show a warning about the different lengths because there is a possibility of unexpected results and which data set controls the resulting format.

Toni2 · Posted 10-09-2021 04:06 AM

hi, thank you for your support i run it with the strip and works. It seems i had some spaces in my ID column. The strange think is that i had used compress for the same reason but did not work.

Borrower_ID2= strip(Borrower_ID2);

Merge with number-character ID

Re: Merge with number-character ID

Re: Merge with number-character ID

Re: Merge with number-character ID

Merge with number-character ID

Re: Merge with number-character ID

Re: Merge with number-character ID

Re: Merge with number-character ID

SAS Innovate 2025: Call for Content