Hi,
I have a problem with calculating distance between observations, which is the variable model. Each observation contains 5 coordinate points, {fxi,fyi}, i=1 to 5; The dataset as following, is there any macro or for loop can do the calculation? Thanks!
obs | model | fx1 | fy1 | fx2 | fy2 | fx3 | fy3 | fx4 | fy4 | fx5 | fy5 |
1 | a | 129 | 16.5 | 129 | 22.25 | 129 | 59.75 | 129 | 90.75 | 129 | 112.25 |
2 | b | 156.9375 | 18 | 156.9375 | 26.5 | 156.9375 | 69 | 156.9375 | 109.5 | 156.9375 | 137 |
3 | c | 140 | 26 | 140 | 31 | 140 | 76.5 | 140 | 116 | 140 | 142.75 |
4 | d | 109.7188 | 24 | 109.7188 | 29.5 | 109.7188 | 53.75 | 109.7188 | 87 | 109.7188 | 100.25 |
5 | e | 145.5 | 7 | 145.5 | 11.5 | 145.5 | 46.5 | 145.5 | 90 | 145.5 | 110.5 |
6 | f | 94.28125 | 15 | 94.28125 | 19.5 | 94.28125 | 47.5 | 94.28125 | 91.25 | 94.28125 | 111.25 |
7 | g | 150 | 28.5 | 150 | 34 | 150 | 83 | 150 | 126 | 150 | 172 |
8 | a | 147 | 9.5 | 147 | 17 | 147 | 62 | 147 | 102.5 | 147 | 141 |
9 | b | 91.4375 | 12.75 | 91.4375 | 17.75 | 91.4375 | 48.5 | 91.4375 | 82.75 | 91.4375 | 89.25 |
10 | c | 106.3438 | 2.25 | 106.3438 | 4.75 | 106.3438 | 34.5 | 106.3438 | 57.5 | 106.3438 | 78 |
11 | d | 130.9375 | 9.75 | 130.9375 | 14.75 | 130.9375 | 51.5 | 130.9375 | 79.5 | 130.9375 | 99.5 |
12 | e | 143.5625 | 21.5 | 143.5625 | 26.5 | 143.5625 | 74 | 143.5625 | 119.5 | 143.5625 | 142 |
What would your distance calculation look like?
Please include sample output.
Hi,
The function used to caluculate euclidean distance should be dis(distance between any two oberservation)= sum(squr((fyi1-fyi2)**2+(fxi1-fxi2)**2)),{fxi1,fyi1} from the first observation, the other from the second observation, i=1 to 5. Thanks
if it was Eduical distance,
Data step also can do that.
data have;
infile cards truncover expandtabs;
input obs model $ fx1 fy1 fx2 fy2 fx3 fy3 fx4 fy4 fx5 fy5;
cards;
1 a 129 16.5 129 22.25 129 59.75 129 90.75 129 112.25
2 b 156.9375 18 156.9375 26.5 156.9375 69 156.9375 109.5 156.9375 137
3 c 140 26 140 31 140 76.5 140 116 140 142.75
4 d 109.7188 24 109.7188 29.5 109.7188 53.75 109.7188 87 109.7188 100.25
5 e 145.5 7 145.5 11.5 145.5 46.5 145.5 90 145.5 110.5
6 f 94.28125 15 94.28125 19.5 94.28125 47.5 94.28125 91.25 94.28125 111.25
7 g 150 28.5 150 34 150 83 150 126 150 172
8 a 147 9.5 147 17 147 62 147 102.5 147 141
9 b 91.4375 12.75 91.4375 17.75 91.4375 48.5 91.4375 82.75 91.4375 89.25
10 c 106.3438 2.25 106.3438 4.75 106.3438 34.5 106.3438 57.5 106.3438 78
11 d 130.9375 9.75 130.9375 14.75 130.9375 51.5 130.9375 79.5 130.9375 99.5
12 e 143.5625 21.5 143.5625 26.5 143.5625 74 143.5625 119.5 143.5625 142
;
run;
proc iml;
var_x='fx1':'fx5';
var_y='fy1':'fy5';
use have;
read all var var_x into x;
read all var var_y into y;
close;
distance= sqrt((y-x)[,##]);
create distance var{distance};
append;
close;
data want;
merge have distance;
run;
Hi,
Thanks for help. This seems work but not get the right result, because the function used to caluculate euclidean distance is
dis(distance between any two oberservation)= sum(squr((fyi1-fyi2)**2+(fxi1-fxi2)**2)),{fxi1,fyi1} from the first observation, the other from the second observation, i=1 to 5. So should the sum equation be modified? Thanks.
Look at proc distance with method=Euclid.
data have;
infile cards truncover expandtabs;
input obs model $ fx1 fy1 fx2 fy2 fx3 fy3 fx4 fy4 fx5 fy5;
cards;
1 a 129 16.5 129 22.25 129 59.75 129 90.75 129 112.25
2 b 156.9375 18 156.9375 26.5 156.9375 69 156.9375 109.5 156.9375 137
3 c 140 26 140 31 140 76.5 140 116 140 142.75
4 d 109.7188 24 109.7188 29.5 109.7188 53.75 109.7188 87 109.7188 100.25
5 e 145.5 7 145.5 11.5 145.5 46.5 145.5 90 145.5 110.5
6 f 94.28125 15 94.28125 19.5 94.28125 47.5 94.28125 91.25 94.28125 111.25
7 g 150 28.5 150 34 150 83 150 126 150 172
8 a 147 9.5 147 17 147 62 147 102.5 147 141
9 b 91.4375 12.75 91.4375 17.75 91.4375 48.5 91.4375 82.75 91.4375 89.25
10 c 106.3438 2.25 106.3438 4.75 106.3438 34.5 106.3438 57.5 106.3438 78
11 d 130.9375 9.75 130.9375 14.75 130.9375 51.5 130.9375 79.5 130.9375 99.5
12 e 143.5625 21.5 143.5625 26.5 143.5625 74 143.5625 119.5 143.5625 142
;
run;
data want1;
set have;
array x{*} fx:;
array y{*} fy:;
sum=0;
do i=1 to dim(x);
sum+(y{i}-x{i})**2;
end;
distance=sqrt(sum);
drop sum i;
run;
@Ksharp I think the OP wants distance between different rows.
Correctly. But each row composed of 5 {fxi,fyi} points. If I used Proc distance, I won't get correct result. For proc distance with Euclid, the calculate function will be dist=sqrt[(fx11-fx12)**2+(fy11-fy12)**2+ (fx21-fx22)**2+ (fy21-fy22)**2+...+ (fx51-fx52)**2+ (fy51-fy52)**2]. However, what I want to get is: dist=sqrt[(fx11-fx12)**2+(fy11-fy12)**2]+ sqrt[(fx21-fx22)**2+(fy21-fy22)**2]+...+sqrt[(fx51-fx52)**2+ (fy51-fy52)**2].
Are you sure? Post the code you've used please.
It might be worth switching it to a long format then and grouping to be able to use the formula.
Otherwise a SQL cross join as long as you're willing to type out the formula.
proc sql;
select a.*, b.fx1 as fx12, b.fy1 as fy12 ...,
sqrt[(fx11-fx12)**2+(fy11-fy12)**2]+ sqrt[(fx21-fx22)**2+(fy21-fy22)**2]+...+sqrt[(fx51
from have as a
cross join have as b;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.