Hello Experts,
For unknown reason, my OP was deleted.
I will post my question here again.
if I have a table below:
ID text term
1 ABCAB A
1 ABCAB B
I want to find the shortest distance between terms within same ID.
P.S same ID would have same text
For example: in the first record I have ID=1 text=ABC term=A , in the second record I have ID=1 text=ABC and term=B
I want to compare A with A, and A with B, and B with B for ID=1
the position of first A is 1, the second A has a position of 4 so the distance between A with A is 4-1=3
the position of first A is 1, the first B has a position=2, the second B has a position=5
so the distance between first A with first B is 2-1=1 and the distance between first A with second B is 5-1=4 so shortest distance between A with B is 1
the position of second A is 4, the position of first B is 2 so the distance is 2-4=-2,
the position of second A is 4, the position of second B is 5 so the distance is 5-4=1
etc.
so the final table should look like:
ID term1 term2 shortest distance
1 A A 1
1 A B 1
1 B B 3
Hello Experts,
For unknown reason, my OP was deleted.
I will post my question here again.
if I have a table below:
ID text term
1 ABCAB A
1 ABCAB B
I want to find the shortest distance between terms within same ID.
P.S same ID would have same text
For example: in the first record I have ID=1 text=ABC term=A , in the second record I have ID=1 text=ABC and term=B
I want to compare A with A, and A with B, and B with B for ID=1
the position of first A is 1, the second A has a position of 4 so the distance between A with A is 4-1=3
the position of first A is 1, the first B has a position=2, the second B has a position=5
so the distance between first A with first B is 2-1=1 and the distance between first A with second B is 5-1=4 so shortest distance between A with B is 1
the position of second A is 4, the position of first B is 2 so the distance is 2-4=-2,
the position of second A is 4, the position of second B is 5 so the distance is 5-4=1
etc.
so the final table should look like:
ID term1 term2 shortest distance
1 A A 1
1 A B 1
1 B B 1
Thanks
Your original post is still there: https://communities.sas.com/t5/General-SAS-Programming/find-the-shortest-distance-between-terms-with...
Ah...I received a message saying that post was deleted...
Right, well your post is quite confusing, and do remember to post test data in the form of a datastep in future. This shold start you off:
data have; input id text $ term $; datalines; 1 ABCAB A 1 ABCAB B ; run; data want; set have; do i=1 to lengthn(text); dist=999; do j=i to lengthn(text); if char(text,i)=char(text,j) and j-i < dist then dist=j-1; end; end; run;
So you see you can loop over a string, which is just an array of characters.
sorry for the confusion, I have modified the question in accordance to my other post (same question). I hope this helps to clarify.
How is the B-B distance 1?
sorry it was typo.....it is fixed now!
Like this?
data HAVE;
input ID TEXT $ TERM $;
cards;
1 ABCAB A
1 ABCAB B
run;
data WANT;
retain TERMS;
length TERMS $80;
set HAVE;
by ID;
if first.ID then TERMS='';
if not findw(TERMS,trim(TERM)) then TERMS=catx(' ', TERMS, TERM);
if last.ID then do I=1 to countw(TERMS,' ');
do J=1 to countw(TERMS,' ');
TERM1 =scan(TERMS, I);
TERM2 =scan(TERMS, J);
DIST =1e9;
POS1 =0 ;
do until(POS1=0|POS2=0);
POS1=find(TEXT,trim(TERM1),POS1+1);
POS2=find(TEXT,trim(TERM2),POS1+1);
if POS1 & POS2 then DIST=min(DIST,POS2-POS1);
end;
output;
end;
end;
run;
proc print noobs;
var ID TERM1 TERM2 DIST;
run;
ID | TERM1 | TERM2 | DIST |
---|---|---|---|
1 | A | A | 3 |
1 | A | B | 1 |
1 | B | A | 2 |
1 | B | B | 3 |
Hi thanks for your prompt reply.
the shorted distance between A and B or B and A should be same both =1
I think I got you close enough.
Just do a quick post-processing step if needed.
HI I realise there is a small issue with the code and I modified below:
data want;
retain terms;
length terms $32767;
set troy;
by source_key_value1;
if first.source_key_value1 then
terms='';
if not findw(terms,trim(upcase(term))) then
terms=catx(' ', upcase(terms), upcase(term));
if last.source_key_value1 then
do i=1 to countw(upcase(upcase(terms)),' ');
do j=1 to countw(upcase(terms),' ');
term1 =scan(upcase(terms), i);
term2 =scan(upcase(terms), j);
dist =999999;
pos1 =0;
pos2 =0;
do until(pos1=0 or pos2=0);
pos1=find(compress(msg1),trim(upcase(term1)),pos1+1);
if upcase(term1)=upcase(term2) then
pos2=pos1;
pos2=find(compress(msg1),trim(upcase(term2)),pos2+1);
if pos1 & pos2 then
dist=min(dist,abs(pos2-pos1));
end;
output;
end;
end;
run;
There are no issues with the code and it did axactly what you requested.
If you have length and case to take into account, use the find() function's modifiers: find(V1,V2,'it') -rather than upcase() and compress() and trim()- to make more legible code.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.