Paige,
Forgive me if I overlooked your comments, I am aware of that the length of COL2 is less or equal to COL1, but does COL2 have to be inclusive to COL1? For instance, consider the following examples:
for COL1=AA999AAB,
COL2 of AA999 is a better match or AA999AB? notice the latter has the bigger n? if it is the latter, then what is your rule of dealing ties? eg. AA999AB or AA999AC?
Thanks,
Haikuo
Hai.kuo wrote:
for COL1=AA999AAB,
COL2 of AA999 is a better match or AA999AB?
For the case above, COL2 of AA999 is the match I want, as the requirement seems to have been incompletely stated. When you match COL1 and COL2, there cannot be any mismatches, positions 1 thru length(COL2) must match, and I want the longest COL2 which has this property. If you match COL1=AA999AAB with COL2=AA999AB, there is a mismatch in postion 7, so it is disqualified.
In that case, will my code suffice your need?
Haikuo
Yes , it seems to work as well (although of the 4 solutions that have now been provided, I understand three and I'm not really sure I see why your code works, but perhaps that just a matter of studying it further).
Anyway, thanks to everyone for helping me on this and providing some interesting learnings.
Lack of SQL skills would force me to take a totally different approach.
Step 1: Create a format that translates from the value of COL2 to the observation number holding that value.
Step 2: Loop through COL1 (all characters down to 1 character) until you find a match.
For example, hard-coding the format just for readability step 1 would look like this:
proc format;
value $recno
'AA999'='1'
'AA999A'='2'
'AA999B'='3'
'AA999AB'='4'
other='Not Found';
run;
In practice, I'm sure you would want to switch to a CNTLIN= data set.
Then step 2 uses the format:
data want;
set col1_source;
if COL1 > ' ' then do _n_=length(COL1) to 1 by -1 until (result ne 'Not Found');
result = put(substr(COL1, 1, _n_), $recno.);
end;
if result ne 'Not Found' then do;
recno = input(result, 8.);
set col2_dataset point=recno;
end;
run;
It's sophisticated, but not horribly complex.
Good luck.
Paige,
Did you try the code I had suggested? It gives the same results as your code but, additionally, excludes cases where there is no match.
Art
If you wanted to go for all possible solutions. I missed:
- hashing
- point= usage with two datasets going around independly
- ?
Hi Paige,
Sorry . I found a problem with compare() . it return the first different character between two variable,no matter it is a blank or other character. Check the following the example , it is the wrong result.
data a;
input COL1 : $20.;
cards;
AA999AAB
AA999ABB
AA999A
;
run;
data b;
input COL2 : $20.;
cards;
AA999
AA999AA
AA999B
;
run;
proc sql;
create table want as
select *,case when compare(col1,col2,'il') eq 0 then 999999
when compare(col1,col2,'il') gt 0 then compare(col1,col2,'il')
end as dis
from a,b group by col1 having calculated dis=max(calculated dis) ;
quit;
And I think HaiKuo get the right idea. Use him code or Use the following code, they are the almost same.
data a;
input COL1 : $20.;
cards;
AA999AAB
AA999ABB
AA999AA
;
run;
data b;
input COL2 : $20.;
cards;
AA999
AA999AA
AA999B
;
run;
proc sql;
create table want as
select *
from a,b
where compare(strip(col1),strip(col2),'il:')=0
group by col1
having length(col2)=max(length(col2)) ;
quit;
Xia Keshan
Message was edited by: xia keshan
Hi Paige,
I figured out another way to get it ,if your table is big.
data a; input COL1 : $20.; cards; AA999AAB AA999ABB AA999AA BA ; run; data b; input COL2 : $20.; cards; AA999 AA999AA AA999B ; run; data want; if _n_ eq 1 then do; if 0 then set b; declare hash h(dataset:'b'); h.definekey('col2'); h.definedone(); end; set a; do i=length(col1) to 1 by -1; col2=substr(col1,1,i); if h.check()=0 then leave; call missing(col2); end; run;
Xia Keshan
Message was edited by: xia keshan
very nice. but lurking SAS forum on a Sunday morning, you are sick.:smileysilly: Okay, guess l shouldn't say that, as Saturday night is no better in this regard, so both of us need to get a life. LOL.
It doesn't matter . I just overlooked the complication of problem . It is noon in China now. Anyway, you did a good job, HaiKuo.
Xia Keshan
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.