turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- find the shortest distance between terms within te...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-17-2017 12:20 AM - edited 07-17-2017 09:27 PM

Hello Experts,

For unknown reason, my OP was deleted.

I will post my question here again.

if I have a table below:

ID text term

1 ABCAB A

1 ABCAB B

I want to find the shortest distance between terms within same ID.

P.S same ID would have same text

For example: in the first record I have ID=1 text=ABC term=A , in the second record I have ID=1 text=ABC and term=B

I want to compare A with A, and A with B, and B with B for ID=1

the position of first A is 1, the second A has a position of 4 so the distance between A with A is 4-1=3

the position of first A is 1, the first B has a position=2, the second B has a position=5

so the distance between first A with first B is 2-1=1 and the distance between first A with second B is 5-1=4 so shortest distance between A with B is 1

the position of second A is 4, the position of first B is 2 so the distance is 2-4=-2,

the position of second A is 4, the position of second B is 5 so the distance is 5-4=1

etc.

so the final table should look like:

ID term1 term2 shortest distance

1 A A 1

1 A B 1

1 B B 3

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gyambqt

07-17-2017 02:27 AM

Hello Experts,

For unknown reason, my OP was deleted.

I will post my question here again.

if I have a table below:

ID text term

1 ABCAB A

1 ABCAB B

I want to find the shortest distance between terms within same ID.

P.S same ID would have same text

For example: in the first record I have ID=1 text=ABC term=A , in the second record I have ID=1 text=ABC and term=B

I want to compare A with A, and A with B, and B with B for ID=1

the position of first A is 1, the second A has a position of 4 so the distance between A with A is 4-1=3

the position of first A is 1, the first B has a position=2, the second B has a position=5

so the distance between first A with first B is 2-1=1 and the distance between first A with second B is 5-1=4 so shortest distance between A with B is 1

the position of second A is 4, the position of first B is 2 so the distance is 2-4=-2,

the position of second A is 4, the position of second B is 5 so the distance is 5-4=1

etc.

so the final table should look like:

ID term1 term2 shortest distance

1 A A 1

1 A B 1

1 B B 1

Thanks

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gyambqt

07-17-2017 02:30 AM

Your original post is still there: https://communities.sas.com/t5/General-SAS-Programming/find-the-shortest-distance-between-terms-with...

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

Maxims of Maximally Efficient SAS Programmers

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to KurtBremser

07-17-2017 02:33 AM

Ah...I received a message saying that post was deleted...

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gyambqt

07-17-2017 04:19 AM

Right, well your post is quite confusing, and do remember to post test data in the form of a datastep in future. This shold start you off:

data have; input id text $ term $; datalines; 1 ABCAB A 1 ABCAB B ; run; data want; set have; do i=1 to lengthn(text); dist=999; do j=i to lengthn(text); if char(text,i)=char(text,j) and j-i < dist then dist=j-1; end; end; run;

So you see you can loop over a string, which is just an array of characters.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-17-2017 07:41 PM

sorry for the confusion, I have modified the question in accordance to my other post (same question). I hope this helps to clarify.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gyambqt

07-17-2017 09:24 PM

How is the B-B distance 1?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ChrisNZ

07-17-2017 09:28 PM

sorry it was typo.....it is fixed now!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gyambqt

07-17-2017 09:58 PM

Like this?

```
data HAVE;
input ID TEXT $ TERM $;
cards;
1 ABCAB A
1 ABCAB B
run;
data WANT;
retain TERMS;
length TERMS $80;
set HAVE;
by ID;
if first.ID then TERMS='';
if not findw(TERMS,trim(TERM)) then TERMS=catx(' ', TERMS, TERM);
if last.ID then do I=1 to countw(TERMS,' ');
do J=1 to countw(TERMS,' ');
TERM1 =scan(TERMS, I);
TERM2 =scan(TERMS, J);
DIST =1e9;
POS1 =0 ;
do until(POS1=0|POS2=0);
POS1=find(TEXT,trim(TERM1),POS1+1);
POS2=find(TEXT,trim(TERM2),POS1+1);
if POS1 & POS2 then DIST=min(DIST,POS2-POS1);
end;
output;
end;
end;
run;
proc print noobs;
var ID TERM1 TERM2 DIST;
run;
```

ID | TERM1 | TERM2 | DIST |
---|---|---|---|

1 | A | A | 3 |

1 | A | B | 1 |

1 | B | A | 2 |

1 | B | B | 3 |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ChrisNZ

07-17-2017 11:20 PM

Hi thanks for your prompt reply.

the shorted distance between A and B or B and A should be same both =1

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gyambqt

07-17-2017 11:51 PM

I think I got you close enough.

Just do a quick post-processing step if needed.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ChrisNZ

07-18-2017 01:39 AM

HI I realise there is a small issue with the code and I modified below:

**data** want;

retain terms;

length terms $**32767**;

set troy;

by source_key_value1;

if first.source_key_value1 then

terms='';

if not findw(terms,trim(upcase(term))) then

terms=catx(' ', upcase(terms), upcase(term));

if last.source_key_value1 then

do i=**1** to countw(upcase(upcase(terms)),' ');

do j=**1** to countw(upcase(terms),' ');

term1 =scan(upcase(terms), i);

term2 =scan(upcase(terms), j);

dist =**999999**;

pos1 =**0**;

pos2 =**0**;

do until(pos1=**0** or pos2=**0**);

pos1=find(compress(msg1),trim(upcase(term1)),pos1+**1**);

if upcase(term1)=upcase(term2) then

pos2=pos1;

pos2=find(compress(msg1),trim(upcase(term2)),pos2+**1**);

if pos1 & pos2 then

dist=min(dist,abs(pos2-pos1));

end;

output;

end;

end;

**run**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gyambqt

07-18-2017 05:21 PM

There are no issues with the code and it did axactly what you requested.

If you have length and case to take into account, use the find() function's modifiers: find(V1,V2,'it') -rather than upcase() and compress() and trim()- to make more legible code.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gyambqt

07-20-2017 04:41 AM

Will there ever be more than two terms for an ID?

Is there a text length limit? Is it 5 characters or more?

Is there a text length limit? Is it 5 characters or more?