BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Haikuo
Onyx | Level 15

Paige,

Forgive me if I overlooked your comments, I am aware of that the length of COL2 is less or equal to COL1,  but does COL2 have to be inclusive to COL1? For instance, consider the following examples:

for COL1=AA999AAB,

COL2 of AA999 is a better match or AA999AB? notice the latter has the bigger n? if it is the latter, then what is your rule of dealing ties? eg.  AA999AB or  AA999AC?

Thanks,

Haikuo

PaigeMiller
Diamond | Level 26

Hai.kuo wrote:

for COL1=AA999AAB,

COL2 of AA999 is a better match or AA999AB?

For the case above, COL2 of AA999 is the match I want, as the requirement seems to have been incompletely stated. When you match COL1 and COL2, there cannot be any mismatches, positions 1 thru length(COL2) must match, and I want the longest COL2 which has this property. If you match COL1=AA999AAB with COL2=AA999AB, there is a mismatch in postion 7, so it is disqualified.

--
Paige Miller
Haikuo
Onyx | Level 15

In that case, will my code suffice your need?

Haikuo

PaigeMiller
Diamond | Level 26

Yes , it seems to work as well (although of the 4 solutions that have now been provided, I understand three and I'm not really sure I see why your code works, but perhaps that just a matter of studying it further).

Anyway, thanks to everyone for helping me on this and providing some interesting learnings.

--
Paige Miller
Astounding
PROC Star

Lack of SQL skills would force me to take a totally different approach.

Step 1:  Create a format that translates from the value of COL2 to the observation number holding that value.

Step 2:  Loop through COL1 (all characters down to 1 character) until you find a match.

For example, hard-coding the format just for readability step 1 would look like this:

proc format;
value $recno

'AA999'='1'

'AA999A'='2'

'AA999B'='3'

'AA999AB'='4'

other='Not Found';

run;

In practice, I'm sure you would want to switch to a CNTLIN= data set.

Then step 2 uses the format:

data want;

set col1_source;

if COL1 > ' ' then do _n_=length(COL1) to 1 by -1 until (result ne 'Not Found');

   result = put(substr(COL1, 1, _n_), $recno.);

end;

if result ne 'Not Found' then do;

   recno = input(result, 8.);

   set col2_dataset point=recno;

end;

run;

It's sophisticated, but not horribly complex.

Good luck.

PaigeMiller
Diamond | Level 26

Yes, . I originally had a loop solution, which does work, but in any event, the reason for my question was to see if simpler methods are available (and also hoping that I would learn something, which has happened!)

--
Paige Miller
art297
Opal | Level 21

Paige,

Did you try the code I had suggested? It gives the same results as your code but, additionally, excludes cases where there is no match.

Art

PaigeMiller
Diamond | Level 26

Seem to work, , thank you!

--
Paige Miller
jakarman
Barite | Level 11

If you wanted to go for all possible solutions. I missed:

- hashing

- point= usage with two datasets going around independly 

- ?

---->-- ja karman --<-----
Ksharp
Super User

Hi Paige,

Sorry . I found a problem with compare() . it return the first different character between two variable,no matter it is a blank or other character. Check the following the example , it is the wrong result.

data a;

input COL1  : $20.;

cards;  

AA999AAB

AA999ABB

AA999A

;

run;

                         

data b;

input COL2  : $20.;

cards;

AA999

AA999AA

AA999B

;

run;

proc sql;

create table want as

select *,case when compare(col1,col2,'il') eq 0 then 999999

              when compare(col1,col2,'il') gt 0 then   compare(col1,col2,'il')

                 end as dis

from a,b group by col1 having calculated dis=max(calculated dis) ;

quit; 

And I think HaiKuo get the right idea. Use him code or Use the following code, they  are the almost same.

data a;

input COL1  : $20.;

cards;   

AA999AAB 

AA999ABB

AA999AA

;

run;

                          

data b;

input COL2  : $20.;

cards; 

AA999

AA999AA

AA999B

;

run;

proc sql;

create table want as

select *

from a,b

where compare(strip(col1),strip(col2),'il:')=0

group by col1

having length(col2)=max(length(col2)) ;

quit;  

Xia Keshan

Message was edited by: xia keshan

Ksharp
Super User

Hi Paige,

I figured out another way to get it ,if your table is big.

data a;
input COL1  : $20.;
cards;   
AA999AAB 
AA999ABB
AA999AA
BA
;
run;
                          
data b;
input COL2  : $20.;
cards; 
AA999
AA999AA
AA999B
;
run;
data want;
 if _n_ eq 1 then do;
  if 0 then set b;
  declare hash h(dataset:'b');
  h.definekey('col2');
  h.definedone();
end;
set a;
do i=length(col1) to 1 by -1;
 col2=substr(col1,1,i);
 if h.check()=0 then leave;
 call missing(col2);
end;
run;

Xia Keshan

Message was edited by: xia keshan

Haikuo
Onyx | Level 15

very nice. but lurking SAS forum on a Sunday morning, you are sick.:smileysilly: Okay, guess l shouldn't say that, as Saturday night is no better in this regard, so both of us need to get a life. LOL.

Ksharp
Super User

It doesn't matter . I just overlooked the complication of problem . It is noon in China now. Anyway, you did a good job, HaiKuo.

Xia Keshan

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 27 replies
  • 4530 views
  • 3 likes
  • 8 in conversation