Hi,
I have a problem with calculating distance between observations, which is the variable model. Each observation contains 5 coordinate points, {fxi,fyi}, i=1 to 5; The dataset as following, is there any macro or for loop can do the calculation? Thanks!
obs | model | fx1 | fy1 | fx2 | fy2 | fx3 | fy3 | fx4 | fy4 | fx5 | fy5 |
1 | a | 129 | 16.5 | 129 | 22.25 | 129 | 59.75 | 129 | 90.75 | 129 | 112.25 |
2 | b | 156.9375 | 18 | 156.9375 | 26.5 | 156.9375 | 69 | 156.9375 | 109.5 | 156.9375 | 137 |
3 | c | 140 | 26 | 140 | 31 | 140 | 76.5 | 140 | 116 | 140 | 142.75 |
4 | d | 109.7188 | 24 | 109.7188 | 29.5 | 109.7188 | 53.75 | 109.7188 | 87 | 109.7188 | 100.25 |
5 | e | 145.5 | 7 | 145.5 | 11.5 | 145.5 | 46.5 | 145.5 | 90 | 145.5 | 110.5 |
6 | f | 94.28125 | 15 | 94.28125 | 19.5 | 94.28125 | 47.5 | 94.28125 | 91.25 | 94.28125 | 111.25 |
7 | g | 150 | 28.5 | 150 | 34 | 150 | 83 | 150 | 126 | 150 | 172 |
8 | a | 147 | 9.5 | 147 | 17 | 147 | 62 | 147 | 102.5 | 147 | 141 |
9 | b | 91.4375 | 12.75 | 91.4375 | 17.75 | 91.4375 | 48.5 | 91.4375 | 82.75 | 91.4375 | 89.25 |
10 | c | 106.3438 | 2.25 | 106.3438 | 4.75 | 106.3438 | 34.5 | 106.3438 | 57.5 | 106.3438 | 78 |
11 | d | 130.9375 | 9.75 | 130.9375 | 14.75 | 130.9375 | 51.5 | 130.9375 | 79.5 | 130.9375 | 99.5 |
12 | e | 143.5625 | 21.5 | 143.5625 | 26.5 | 143.5625 | 74 | 143.5625 | 119.5 | 143.5625 | 142 |
What would your distance calculation look like?
Please include sample output.
Hi,
The function used to caluculate euclidean distance should be dis(distance between any two oberservation)= sum(squr((fyi1-fyi2)**2+(fxi1-fxi2)**2)),{fxi1,fyi1} from the first observation, the other from the second observation, i=1 to 5. Thanks
if it was Eduical distance,
Data step also can do that.
data have;
infile cards truncover expandtabs;
input obs model $ fx1 fy1 fx2 fy2 fx3 fy3 fx4 fy4 fx5 fy5;
cards;
1 a 129 16.5 129 22.25 129 59.75 129 90.75 129 112.25
2 b 156.9375 18 156.9375 26.5 156.9375 69 156.9375 109.5 156.9375 137
3 c 140 26 140 31 140 76.5 140 116 140 142.75
4 d 109.7188 24 109.7188 29.5 109.7188 53.75 109.7188 87 109.7188 100.25
5 e 145.5 7 145.5 11.5 145.5 46.5 145.5 90 145.5 110.5
6 f 94.28125 15 94.28125 19.5 94.28125 47.5 94.28125 91.25 94.28125 111.25
7 g 150 28.5 150 34 150 83 150 126 150 172
8 a 147 9.5 147 17 147 62 147 102.5 147 141
9 b 91.4375 12.75 91.4375 17.75 91.4375 48.5 91.4375 82.75 91.4375 89.25
10 c 106.3438 2.25 106.3438 4.75 106.3438 34.5 106.3438 57.5 106.3438 78
11 d 130.9375 9.75 130.9375 14.75 130.9375 51.5 130.9375 79.5 130.9375 99.5
12 e 143.5625 21.5 143.5625 26.5 143.5625 74 143.5625 119.5 143.5625 142
;
run;
proc iml;
var_x='fx1':'fx5';
var_y='fy1':'fy5';
use have;
read all var var_x into x;
read all var var_y into y;
close;
distance= sqrt((y-x)[,##]);
create distance var{distance};
append;
close;
data want;
merge have distance;
run;
Hi,
Thanks for help. This seems work but not get the right result, because the function used to caluculate euclidean distance is
dis(distance between any two oberservation)= sum(squr((fyi1-fyi2)**2+(fxi1-fxi2)**2)),{fxi1,fyi1} from the first observation, the other from the second observation, i=1 to 5. So should the sum equation be modified? Thanks.
Look at proc distance with method=Euclid.
data have;
infile cards truncover expandtabs;
input obs model $ fx1 fy1 fx2 fy2 fx3 fy3 fx4 fy4 fx5 fy5;
cards;
1 a 129 16.5 129 22.25 129 59.75 129 90.75 129 112.25
2 b 156.9375 18 156.9375 26.5 156.9375 69 156.9375 109.5 156.9375 137
3 c 140 26 140 31 140 76.5 140 116 140 142.75
4 d 109.7188 24 109.7188 29.5 109.7188 53.75 109.7188 87 109.7188 100.25
5 e 145.5 7 145.5 11.5 145.5 46.5 145.5 90 145.5 110.5
6 f 94.28125 15 94.28125 19.5 94.28125 47.5 94.28125 91.25 94.28125 111.25
7 g 150 28.5 150 34 150 83 150 126 150 172
8 a 147 9.5 147 17 147 62 147 102.5 147 141
9 b 91.4375 12.75 91.4375 17.75 91.4375 48.5 91.4375 82.75 91.4375 89.25
10 c 106.3438 2.25 106.3438 4.75 106.3438 34.5 106.3438 57.5 106.3438 78
11 d 130.9375 9.75 130.9375 14.75 130.9375 51.5 130.9375 79.5 130.9375 99.5
12 e 143.5625 21.5 143.5625 26.5 143.5625 74 143.5625 119.5 143.5625 142
;
run;
data want1;
set have;
array x{*} fx:;
array y{*} fy:;
sum=0;
do i=1 to dim(x);
sum+(y{i}-x{i})**2;
end;
distance=sqrt(sum);
drop sum i;
run;
@Ksharp I think the OP wants distance between different rows.
Correctly. But each row composed of 5 {fxi,fyi} points. If I used Proc distance, I won't get correct result. For proc distance with Euclid, the calculate function will be dist=sqrt[(fx11-fx12)**2+(fy11-fy12)**2+ (fx21-fx22)**2+ (fy21-fy22)**2+...+ (fx51-fx52)**2+ (fy51-fy52)**2]. However, what I want to get is: dist=sqrt[(fx11-fx12)**2+(fy11-fy12)**2]+ sqrt[(fx21-fx22)**2+(fy21-fy22)**2]+...+sqrt[(fx51-fx52)**2+ (fy51-fy52)**2].
Are you sure? Post the code you've used please.
It might be worth switching it to a long format then and grouping to be able to use the formula.
Otherwise a SQL cross join as long as you're willing to type out the formula.
proc sql;
select a.*, b.fx1 as fx12, b.fy1 as fy12 ...,
sqrt[(fx11-fx12)**2+(fy11-fy12)**2]+ sqrt[(fx21-fx22)**2+(fy21-fy22)**2]+...+sqrt[(fx51
from have as a
cross join have as b;
quit;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.