turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- find the shortest distance between terms within te...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-17-2017 12:20 AM - edited 07-17-2017 09:27 PM

Hello Experts,

For unknown reason, my OP was deleted.

I will post my question here again.

if I have a table below:

ID text term

1 ABCAB A

1 ABCAB B

I want to find the shortest distance between terms within same ID.

P.S same ID would have same text

For example: in the first record I have ID=1 text=ABC term=A , in the second record I have ID=1 text=ABC and term=B

I want to compare A with A, and A with B, and B with B for ID=1

the position of first A is 1, the second A has a position of 4 so the distance between A with A is 4-1=3

the position of first A is 1, the first B has a position=2, the second B has a position=5

so the distance between first A with first B is 2-1=1 and the distance between first A with second B is 5-1=4 so shortest distance between A with B is 1

the position of second A is 4, the position of first B is 2 so the distance is 2-4=-2,

the position of second A is 4, the position of second B is 5 so the distance is 5-4=1

etc.

so the final table should look like:

ID term1 term2 shortest distance

1 A A 1

1 A B 1

1 B B 3

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gyambqt

07-17-2017 02:27 AM

Hello Experts,

For unknown reason, my OP was deleted.

I will post my question here again.

if I have a table below:

ID text term

1 ABCAB A

1 ABCAB B

I want to find the shortest distance between terms within same ID.

P.S same ID would have same text

For example: in the first record I have ID=1 text=ABC term=A , in the second record I have ID=1 text=ABC and term=B

I want to compare A with A, and A with B, and B with B for ID=1

the position of first A is 1, the second A has a position of 4 so the distance between A with A is 4-1=3

the position of first A is 1, the first B has a position=2, the second B has a position=5

so the distance between first A with first B is 2-1=1 and the distance between first A with second B is 5-1=4 so shortest distance between A with B is 1

the position of second A is 4, the position of first B is 2 so the distance is 2-4=-2,

the position of second A is 4, the position of second B is 5 so the distance is 5-4=1

etc.

so the final table should look like:

ID term1 term2 shortest distance

1 A A 1

1 A B 1

1 B B 1

Thanks

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gyambqt

07-17-2017 02:30 AM

Your original post is still there: https://communities.sas.com/t5/General-SAS-Programming/find-the-shortest-distance-between-terms-with...

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to KurtBremser

07-17-2017 02:33 AM

Ah...I received a message saying that post was deleted...

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gyambqt

07-17-2017 04:19 AM

Right, well your post is quite confusing, and do remember to post test data in the form of a datastep in future. This shold start you off:

data have; input id text $ term $; datalines; 1 ABCAB A 1 ABCAB B ; run; data want; set have; do i=1 to lengthn(text); dist=999; do j=i to lengthn(text); if char(text,i)=char(text,j) and j-i < dist then dist=j-1; end; end; run;

So you see you can loop over a string, which is just an array of characters.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-17-2017 07:41 PM

sorry for the confusion, I have modified the question in accordance to my other post (same question). I hope this helps to clarify.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gyambqt

07-17-2017 09:24 PM

How is the B-B distance 1?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ChrisNZ

07-17-2017 09:28 PM

sorry it was typo.....it is fixed now!

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gyambqt

07-17-2017 09:58 PM

Like this?

```
data HAVE;
input ID TEXT $ TERM $;
cards;
1 ABCAB A
1 ABCAB B
run;
data WANT;
retain TERMS;
length TERMS $80;
set HAVE;
by ID;
if first.ID then TERMS='';
if not findw(TERMS,trim(TERM)) then TERMS=catx(' ', TERMS, TERM);
if last.ID then do I=1 to countw(TERMS,' ');
do J=1 to countw(TERMS,' ');
TERM1 =scan(TERMS, I);
TERM2 =scan(TERMS, J);
DIST =1e9;
POS1 =0 ;
do until(POS1=0|POS2=0);
POS1=find(TEXT,trim(TERM1),POS1+1);
POS2=find(TEXT,trim(TERM2),POS1+1);
if POS1 & POS2 then DIST=min(DIST,POS2-POS1);
end;
output;
end;
end;
run;
proc print noobs;
var ID TERM1 TERM2 DIST;
run;
```

ID | TERM1 | TERM2 | DIST |
---|---|---|---|

1 | A | A | 3 |

1 | A | B | 1 |

1 | B | A | 2 |

1 | B | B | 3 |

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ChrisNZ

07-17-2017 11:20 PM

Hi thanks for your prompt reply.

the shorted distance between A and B or B and A should be same both =1

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gyambqt

07-17-2017 11:51 PM

I think I got you close enough.

Just do a quick post-processing step if needed.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ChrisNZ

07-18-2017 01:39 AM

HI I realise there is a small issue with the code and I modified below:

**data** want;

retain terms;

length terms $**32767**;

set troy;

by source_key_value1;

if first.source_key_value1 then

terms='';

if not findw(terms,trim(upcase(term))) then

terms=catx(' ', upcase(terms), upcase(term));

if last.source_key_value1 then

do i=**1** to countw(upcase(upcase(terms)),' ');

do j=**1** to countw(upcase(terms),' ');

term1 =scan(upcase(terms), i);

term2 =scan(upcase(terms), j);

dist =**999999**;

pos1 =**0**;

pos2 =**0**;

do until(pos1=**0** or pos2=**0**);

pos1=find(compress(msg1),trim(upcase(term1)),pos1+**1**);

if upcase(term1)=upcase(term2) then

pos2=pos1;

pos2=find(compress(msg1),trim(upcase(term2)),pos2+**1**);

if pos1 & pos2 then

dist=min(dist,abs(pos2-pos1));

end;

output;

end;

end;

**run**;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gyambqt

07-18-2017 05:21 PM

There are no issues with the code and it did axactly what you requested.

If you have length and case to take into account, use the find() function's modifiers: find(V1,V2,'it') -rather than upcase() and compress() and trim()- to make more legible code.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gyambqt

07-20-2017 04:41 AM

Will there ever be more than two terms for an ID?

Is there a text length limit? Is it 5 characters or more?

Is there a text length limit? Is it 5 characters or more?