BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ciro
Quartz | Level 8

Hi,

I would like to calculate distances between all observations within groups.

The result should be a dataset in which for each observation there are as many records as are the observations within the group, along with the distances.

My best attempt to realize this is as follows:

ods select none;
proc modeclus data=have k=4 Neighbor ;
by group;
var x1 x2 x3 x4;
id id;
ods output Neighbor=Neighbor;
run;

data want(drop=id_old);
set Neighbor(rename=(id=id_old));
by group;
retain id;
if not missing(id_old) then id=id_old;
run;

 

But the main problem with this solution seems to be that I cannot choose a distance metric. Or can I?

 

On the other side with proc distance  I can choose among many metrics, but the result is a matrix, as follows:

proc distance data=have out=dist method=euclid SHAPE=SQUARE;
by group;
copy id;
var interval(x1 x2 x3 x4);
run;

I know I can match the dataset with itself and then calculate any distance but I wonder if there are other solutions. 

I would appreciate suggestions on the possibility to choose a distance metric from proc modeclus and/or how the resulting dataset of proc distance has to be manipulated to obtain the shape I need.

And/or any other solution. 

Than you very much in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

Hello,

 

You can PROC COMPARE your output dataset

with "work.dist_trp_match" to check whether your program is right.

 

proc sort data=sashelp.class out=class; by sex name; run;

data class;
 set class;
 retain id2;
 by sex name; 
 if first.sex then do; id2=0; end;
 id2=id2+1;
run;

proc distance data=class out=dist method=euclid SHAPE=SQUARE;
 by sex;
 copy name id2;
 var interval(age height weight);
run;

data dist_trp(drop=dist1-dist10);
 set dist;
 by sex;
 array dist{10} dist1-dist10;
 do Person_withinSameSex = 1 to dim(dist);
  distance_euclid=dist(Person_withinSameSex);
  output;
 end;
run;

PROC SQL noprint;
 create table dist_trp_match as
 select aa.* , bb.name as MatchName
 from   dist_trp(where=(distance_euclid is not missing))    as aa
      , dist(where=(Name is not missing) keep=sex name id2) as bb
 where aa.sex = bb.sex AND aa.Person_withinSameSex=bb.id2 ;
QUIT;
/* end of program */

Koen

View solution in original post

5 REPLIES 5
sbxkoenk
SAS Super FREQ

Here's one solution :

proc sort data=sashelp.class out=class; by sex name; run;

proc distance data=class out=dist method=euclid SHAPE=SQUARE;
 by sex;
 copy name;
 var interval(age height weight);
run;

data dist_trp(drop=dist1-dist10);
 set dist;
 by sex;
 array dist{10} dist1-dist10;
 do Person_withinSameSex = 1 to dim(dist);
  distance_euclid=dist(Person_withinSameSex);
  output;
 end;
run;
/* end of program */

Koen

sbxkoenk
SAS Super FREQ

On top of the previous reply just above ...

 

The MODECLUS procedure clusters observations in a SAS data set by using any of several algorithms based on nonparametric density estimates. No Euclidean distance metric is possible.

 

Here are the methods MODECLUS can use :

SAS/STAT 15.2 User's Guide
The MODECLUS Procedure
Clustering Methods
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_modeclus_details02.htm

 

Koen

 

ciro
Quartz | Level 8

thank you very much.

the only thing missing is that I need to know which  is mached to any id, but I can obtain that

with a little coding like this.

is it correct?

 

proc sort data=sashelp.class out=class; by sex name; run;

proc distance data=class out=dist method=euclid SHAPE=SQUARE;
by sex;
copy name;
var interval(age height weight);
run;

data dist;
set dist;
by sex;
retain id2;
if first.sex then id2=1;
else id2+1;
run;

data dist_trp(drop=dist1-dist10);
*data dist_trp;
set dist;
by sex;
array dist{10} dist1-dist10;
do Person_withinSameSex = 1 to dim(dist);
distance_euclid=dist(Person_withinSameSex);
output;
end;
run;

proc sql;
create table dist_trp as select a.*,b.name as name_matched
from dist_trp as a
left join
dist as b
on a.sex=b.sex and a.Person_withinSameSex=b.id2
order by sex,id2,Person_withinSameSex;
quit;

sbxkoenk
SAS Super FREQ

Hello,

 

You can PROC COMPARE your output dataset

with "work.dist_trp_match" to check whether your program is right.

 

proc sort data=sashelp.class out=class; by sex name; run;

data class;
 set class;
 retain id2;
 by sex name; 
 if first.sex then do; id2=0; end;
 id2=id2+1;
run;

proc distance data=class out=dist method=euclid SHAPE=SQUARE;
 by sex;
 copy name id2;
 var interval(age height weight);
run;

data dist_trp(drop=dist1-dist10);
 set dist;
 by sex;
 array dist{10} dist1-dist10;
 do Person_withinSameSex = 1 to dim(dist);
  distance_euclid=dist(Person_withinSameSex);
  output;
 end;
run;

PROC SQL noprint;
 create table dist_trp_match as
 select aa.* , bb.name as MatchName
 from   dist_trp(where=(distance_euclid is not missing))    as aa
      , dist(where=(Name is not missing) keep=sex name id2) as bb
 where aa.sex = bb.sex AND aa.Person_withinSameSex=bb.id2 ;
QUIT;
/* end of program */

Koen

ciro
Quartz | Level 8

Thank You Koen,

I marked my second message as solution by mistake, since I wanted to mark yours as solution. apologies. 

I don't know how to undo the marking. I wrote to sas to ask the unmarking.

Let's see if they respond.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1002 views
  • 1 like
  • 2 in conversation