BookmarkSubscribeRSS Feed
gyambqt
Obsidian | Level 7

Hello Experts,

 

For unknown reason, my OP was deleted.

 

I will post my question here again.

 

if I have a table below:

 

ID                  text              term

1                  ABCAB             A

1                  ABCAB             B

 

 

I want to find the shortest distance between terms within same ID.

P.S same ID would have same text

 

For example: in the first record I have ID=1 text=ABC  term=A , in the second record I have ID=1 text=ABC and term=B

I want to compare A with A, and A with B, and B with B for ID=1

the position of first A is 1, the second A has a position of 4 so the distance between A with A is 4-1=3

the position of first A is 1, the first B has a position=2, the second B has a position=5

so the distance between first A with first B is 2-1=1 and the distance between first A with second B is 5-1=4 so shortest distance between A with B is 1

the position of second A is 4, the position of first B is 2 so the distance is 2-4=-2, 

the position of second A is 4, the position of second B is 5 so the distance is 5-4=1

etc.

 

so the final table should look like:

ID             term1   term2   shortest distance

1               A           A              1

1                A          B               1

1                B          B              3

 

13 REPLIES 13
gyambqt
Obsidian | Level 7

Hello Experts,

 

For unknown reason, my OP was deleted.

 

I will post my question here again.

 

if I have a table below:

 

ID                  text              term

1                  ABCAB             A

1                  ABCAB             B

 

 

I want to find the shortest distance between terms within same ID.

P.S same ID would have same text

 

For example: in the first record I have ID=1 text=ABC  term=A , in the second record I have ID=1 text=ABC and term=B

I want to compare A with A, and A with B, and B with B for ID=1

the position of first A is 1, the second A has a position of 4 so the distance between A with A is 4-1=3

the position of first A is 1, the first B has a position=2, the second B has a position=5

so the distance between first A with first B is 2-1=1 and the distance between first A with second B is 5-1=4 so shortest distance between A with B is 1

the position of second A is 4, the position of first B is 2 so the distance is 2-4=-2, 

the position of second A is 4, the position of second B is 5 so the distance is 5-4=1

etc.

 

so the final table should look like:

ID             term1   term2   shortest distance

1               A           A              1

1                A          B               1

1                B          B              1

 

 

Thanks

 

 

 

gyambqt
Obsidian | Level 7

Ah...I received a message saying that post was deleted...

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Right, well your post is quite confusing, and do remember to post test data in the form of a datastep in future.  This shold start you off:

data have;
  input id text $ term $;
datalines;
1 ABCAB A
1 ABCAB B
;
run;

data want;
  set have;
  do i=1 to lengthn(text);
    dist=999;
    do j=i to lengthn(text);
      if char(text,i)=char(text,j) and j-i < dist then dist=j-1;
    end;
  end;
run;

So you see you can loop over a string, which is just an array of characters.

gyambqt
Obsidian | Level 7

sorry for the confusion, I have modified the question in accordance to my other post (same question). I hope this helps to clarify.

gyambqt
Obsidian | Level 7

sorry it was typo.....it is fixed now!

ChrisNZ
Tourmaline | Level 20

Like this?

data HAVE;
input ID TEXT $ TERM $;
cards;
1 ABCAB A
1 ABCAB B
run;
data WANT;
  retain TERMS;
  length TERMS $80;
  set HAVE;
  by ID;
  if first.ID then TERMS='';
  if not findw(TERMS,trim(TERM)) then TERMS=catx(' ', TERMS, TERM); 
  if last.ID then do I=1 to countw(TERMS,' ');
    do J=1 to countw(TERMS,' ');
      TERM1 =scan(TERMS, I);
      TERM2 =scan(TERMS, J);
      DIST  =1e9;
      POS1  =0 ;
      do until(POS1=0|POS2=0);
        POS1=find(TEXT,trim(TERM1),POS1+1);  
        POS2=find(TEXT,trim(TERM2),POS1+1);  
        if POS1 & POS2 then DIST=min(DIST,POS2-POS1);
      end;
      output;
    end; 
  end; 
run;  
proc print noobs;
  var ID TERM1 TERM2 DIST;
run; 
     
ID TERM1 TERM2 DIST
1 A A 3
1 A B 1
1 B A 2
1 B B 3

 

gyambqt
Obsidian | Level 7

Hi thanks for your prompt reply.

 

the shorted distance between A and B or B and A should be same both =1

ChrisNZ
Tourmaline | Level 20

I think I got you close enough.

Just do a quick post-processing step if needed.

gyambqt
Obsidian | Level 7

 

HI I realise there is a small issue with the code and I modified below:

 

data want;

retain terms;

length terms $32767;

set troy;

by source_key_value1;

if first.source_key_value1 then

terms='';

if not findw(terms,trim(upcase(term))) then

terms=catx(' ', upcase(terms), upcase(term));

if last.source_key_value1 then

do i=1 to countw(upcase(upcase(terms)),' ');

do j=1 to countw(upcase(terms),' ');

term1 =scan(upcase(terms), i);

term2 =scan(upcase(terms), j);

dist =999999;

pos1 =0;

pos2 =0;

do until(pos1=0 or pos2=0);

pos1=find(compress(msg1),trim(upcase(term1)),pos1+1);

if upcase(term1)=upcase(term2) then

pos2=pos1;

pos2=find(compress(msg1),trim(upcase(term2)),pos2+1);

if pos1 & pos2 then

dist=min(dist,abs(pos2-pos1));

end;

output;

end;

end;

run;

 

 

ChrisNZ
Tourmaline | Level 20

There are no issues with the code and it did axactly what you requested.

If you have length and case to take into account, use the find() function's modifiers:  find(V1,V2,'it') -rather than upcase() and compress() and trim()- to make more legible code. 

 

Peter_C
Rhodochrosite | Level 12
Will there ever be more than two terms for an ID?
Is there a text length limit? Is it 5 characters or more?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 891 views
  • 0 likes
  • 5 in conversation