Hi,
I would like to calculate distances between all observations within groups.
The result should be a dataset in which for each observation there are as many records as are the observations within the group, along with the distances.
My best attempt to realize this is as follows:
ods select none;
proc modeclus data=have k=4 Neighbor ;
by group;
var x1 x2 x3 x4;
id id;
ods output Neighbor=Neighbor;
run;
data want(drop=id_old);
set Neighbor(rename=(id=id_old));
by group;
retain id;
if not missing(id_old) then id=id_old;
run;
But the main problem with this solution seems to be that I cannot choose a distance metric. Or can I?
On the other side with proc distance I can choose among many metrics, but the result is a matrix, as follows:
proc distance data=have out=dist method=euclid SHAPE=SQUARE;
by group;
copy id;
var interval(x1 x2 x3 x4);
run;
I know I can match the dataset with itself and then calculate any distance but I wonder if there are other solutions.
I would appreciate suggestions on the possibility to choose a distance metric from proc modeclus and/or how the resulting dataset of proc distance has to be manipulated to obtain the shape I need.
And/or any other solution.
Than you very much in advance.
Hello,
You can PROC COMPARE your output dataset
with "work.dist_trp_match" to check whether your program is right.
proc sort data=sashelp.class out=class; by sex name; run;
data class;
set class;
retain id2;
by sex name;
if first.sex then do; id2=0; end;
id2=id2+1;
run;
proc distance data=class out=dist method=euclid SHAPE=SQUARE;
by sex;
copy name id2;
var interval(age height weight);
run;
data dist_trp(drop=dist1-dist10);
set dist;
by sex;
array dist{10} dist1-dist10;
do Person_withinSameSex = 1 to dim(dist);
distance_euclid=dist(Person_withinSameSex);
output;
end;
run;
PROC SQL noprint;
create table dist_trp_match as
select aa.* , bb.name as MatchName
from dist_trp(where=(distance_euclid is not missing)) as aa
, dist(where=(Name is not missing) keep=sex name id2) as bb
where aa.sex = bb.sex AND aa.Person_withinSameSex=bb.id2 ;
QUIT;
/* end of program */
Koen
Here's one solution :
proc sort data=sashelp.class out=class; by sex name; run;
proc distance data=class out=dist method=euclid SHAPE=SQUARE;
by sex;
copy name;
var interval(age height weight);
run;
data dist_trp(drop=dist1-dist10);
set dist;
by sex;
array dist{10} dist1-dist10;
do Person_withinSameSex = 1 to dim(dist);
distance_euclid=dist(Person_withinSameSex);
output;
end;
run;
/* end of program */
Koen
On top of the previous reply just above ...
The MODECLUS procedure clusters observations in a SAS data set by using any of several algorithms based on nonparametric density estimates. No Euclidean distance metric is possible.
Here are the methods MODECLUS can use :
SAS/STAT 15.2 User's Guide
The MODECLUS Procedure
Clustering Methods
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_modeclus_details02.htm
Koen
thank you very much.
the only thing missing is that I need to know which is mached to any id, but I can obtain that
with a little coding like this.
is it correct?
proc sort data=sashelp.class out=class; by sex name; run;
proc distance data=class out=dist method=euclid SHAPE=SQUARE;
by sex;
copy name;
var interval(age height weight);
run;
data dist;
set dist;
by sex;
retain id2;
if first.sex then id2=1;
else id2+1;
run;
data dist_trp(drop=dist1-dist10);
*data dist_trp;
set dist;
by sex;
array dist{10} dist1-dist10;
do Person_withinSameSex = 1 to dim(dist);
distance_euclid=dist(Person_withinSameSex);
output;
end;
run;
proc sql;
create table dist_trp as select a.*,b.name as name_matched
from dist_trp as a
left join
dist as b
on a.sex=b.sex and a.Person_withinSameSex=b.id2
order by sex,id2,Person_withinSameSex;
quit;
Hello,
You can PROC COMPARE your output dataset
with "work.dist_trp_match" to check whether your program is right.
proc sort data=sashelp.class out=class; by sex name; run;
data class;
set class;
retain id2;
by sex name;
if first.sex then do; id2=0; end;
id2=id2+1;
run;
proc distance data=class out=dist method=euclid SHAPE=SQUARE;
by sex;
copy name id2;
var interval(age height weight);
run;
data dist_trp(drop=dist1-dist10);
set dist;
by sex;
array dist{10} dist1-dist10;
do Person_withinSameSex = 1 to dim(dist);
distance_euclid=dist(Person_withinSameSex);
output;
end;
run;
PROC SQL noprint;
create table dist_trp_match as
select aa.* , bb.name as MatchName
from dist_trp(where=(distance_euclid is not missing)) as aa
, dist(where=(Name is not missing) keep=sex name id2) as bb
where aa.sex = bb.sex AND aa.Person_withinSameSex=bb.id2 ;
QUIT;
/* end of program */
Koen
Thank You Koen,
I marked my second message as solution by mistake, since I wanted to mark yours as solution. apologies.
I don't know how to undo the marking. I wrote to sas to ask the unmarking.
Let's see if they respond.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.