Creating a symmetrical matrix with character variable to show diagonal and off diagonal cells

Accepted Solution Solved
Reply
Contributor
Posts: 23
Accepted Solution

Creating a symmetrical matrix with character variable to show diagonal and off diagonal cells

[ Edited ]

Hi SAS Community,

 

I have a dataset with multiple rows per a unique id. Each row for a given id corresponds to a different level of a qualitative variable called Practice_Area. There are only four practice areas permitted, which are A, B, C, D. See below:

 

id        Practice_Area

123            A

123            B

123            C

456            A

456            B

456            C

456            D

789            C

789            D

901            B

901            C

901            D

 

I would like to create a 4x4 symmetrical matrix using SAS to show the number of observations in the off diagonal and diagonal cells. See below:

      A  B  C  D

A   2   2  2

B   2   3  3   2 

C   2   3  4   3

D        2  3   3

 

Can anyone post some SAS code to show how this can be done? Thanks !!


Accepted Solutions
Solution
‎10-21-2017 10:19 AM
Super User
Posts: 10,323

Re: Creating a symmetrical matrix with character variable to show diagonal and off diagonal cells

data have;
input id        Practice_Area $;
cards;
123            A
123            B
123            C
456            A
456            B
456            C
456            D
789            C
789            D
901            B
901            C
901            D
;
run;

data temp;
array x{9999} $ 40 _temporary_;
do i=1 by 1 until(last.id);
  set have;
  by id;
 x{i}=Practice_Area;
end;

do m=1 to i;
 do n=m to i;
   v1=x{m};
   v2=x{n};
   output;
   if n ne m then do;
     v2=x{m};
     v1=x{n};
     output;
   end;
 end;
end;
keep id v1 v2;
run;

proc freq data=temp noprint;
 table v1*v2/out=temp1 list nopercent nocum;
run;

proc transpose data=temp1 out=want(drop=_:);
by v1;
id v2;
var count;
run;



View solution in original post


All Replies
Super User
Posts: 21,546

Re: Creating a symmetrical matrix with character variable to show diagonal and off diagonal cells

1. Change your data structure via PROC TRANSPOSE to indicate what variables are present in each using 0/1 -> make it look like my test data below.

2. Use PROC CORR.

data test;
input id v1-v3;
cards;
11 1 0 1
21 0 2 1
31 1 2 0
41 1 1 0
;
ods output sscp=coocs;
proc corr data=test sscp;
var v1-v3;
run;
proc print data=coocs;
run;

 

Contributor
Posts: 23

Re: Creating a symmetrical matrix with character variable to show diagonal and off diagonal cells

Thanks, I will try this soon, and let you know if it works.

 

Regards,

Aaron

Regular Contributor
Posts: 185

Re: Creating a symmetrical matrix with character variable to show diagonal and off diagonal cells

Maybe it's just the time of day, 10 pm in some minutes, but i don't understand the rules to construct the rows. 

Do you want the result as dataset or report?

Contributor
Posts: 23

Re: Creating a symmetrical matrix with character variable to show diagonal and off diagonal cells

Posted in reply to error_prone

Thanks,

 

As a report. The cells indicate the number of observations.

 

Regards,

Aaron

Super User
Posts: 21,546

Re: Creating a symmetrical matrix with character variable to show diagonal and off diagonal cells


ADouglas wrote:

The cells indicate the number of observations.

 

 


 Of what? I'm assuming that means how many ID's have both, ie how many ID's have A, how many ID's have AB, how many ID's have AC etc... primarily all 2 way values. 

 

I'm guessing the question is something like how many people played on X and Y practice area. 

 

But I'm guessing...and I don't like to guess Smiley Happy

Contributor
Posts: 23

Re: Creating a symmetrical matrix with character variable to show diagonal and off diagonal cells

Yes. You are right.

 

How many in AA, how many in AB, how many in BB, etc.

 

Thanks,

Aaron

Super User
Posts: 6,005

Re: Creating a symmetrical matrix with character variable to show diagonal and off diagonal cells

[ Edited ]

I can't test this until Monday, but I think it gets you there:

 

proc sql noprint;

create table pairs as select a.Practice_Area as PA_1, b.Practice_Area as PA_2

from have a, have b

where a.id = b.id ;

quit;

 

 

proc freq data=pairs;

tables pa_1 * pa_2 / norow nocol nopercent;

run;

 

TESTING RESULTS:

 

This worked.  Got the same results as the solution program.

 

Other considerations ...

 

This program generates a report.  The solution program generates a data set.

 

This program is probably not scalable.  The time it would take to generate the Cartesian product on 60K observations might be prohibitive (only one way to find out).

 

The solution program turns Practice Area values into variable names.  Given the possibility that Practice Area names may be longer than 32 characters, there may be issues to consider (particularly if some Practice Areas are identical for the first 32 characters, but different after that point).

Contributor
Posts: 23

Re: Creating a symmetrical matrix with character variable to show diagonal and off diagonal cells

Posted in reply to Astounding

I like this idea. I would like to see if you can get it to work. I will try it too. I never used proc sql statements before. It will be my first. Thanks.

Contributor
Posts: 23

Re: Creating a symmetrical matrix with character variable to show diagonal and off diagonal cells

Posted in reply to Astounding

Very nice approach. I like both solutions a great deal.

 

Thanks,

Aaron

Super User
Posts: 10,323

Re: Creating a symmetrical matrix with character variable to show diagonal and off diagonal cells

How about this one. But you need SAS/IML.
Otherwise use data step would cost you a lot code.



data have;
input id        Practice_Area $;
cards;
123            A
123            B
123            C
456            A
456            B
456            C
456            D
789            C
789            D
901            B
901            C
901            D
;
run;

data temp;
array x{9999} $ 40 _temporary_;
do i=1 by 1 until(last.id);
  set have;
  by id;
 x{i}=Practice_Area;
end;

do m=1 to i;
 do n=m to i;
   v1=x{m};
   v2=x{n};
   output;
 end;
end;
keep id v1 v2;
run;

proc freq data=temp noprint;
 table v1*v2/out=temp1 list nopercent nocum;
run;

proc transpose data=temp1 out=want(drop=_:);
by v1;
id v2;
var count;
run;

proc iml;
use want;
read all var _num_ into x[c=vname];
read all var {v1};
close;

diag=diag(vecdiag(x));
want=coalesce(t(x),0)+coalesce(x,0)-diag;
print want[c=vname r=v1];
quit;

Solution
‎10-21-2017 10:19 AM
Super User
Posts: 10,323

Re: Creating a symmetrical matrix with character variable to show diagonal and off diagonal cells

data have;
input id        Practice_Area $;
cards;
123            A
123            B
123            C
456            A
456            B
456            C
456            D
789            C
789            D
901            B
901            C
901            D
;
run;

data temp;
array x{9999} $ 40 _temporary_;
do i=1 by 1 until(last.id);
  set have;
  by id;
 x{i}=Practice_Area;
end;

do m=1 to i;
 do n=m to i;
   v1=x{m};
   v2=x{n};
   output;
   if n ne m then do;
     v2=x{m};
     v1=x{n};
     output;
   end;
 end;
end;
keep id v1 v2;
run;

proc freq data=temp noprint;
 table v1*v2/out=temp1 list nopercent nocum;
run;

proc transpose data=temp1 out=want(drop=_:);
by v1;
id v2;
var count;
run;



Contributor
Posts: 23

Re: Creating a symmetrical matrix with character variable to show diagonal and off diagonal cells

This approach provides the requested solution, but the only drawback is the Percent of Total Frequency variable on the Temp1 dataset. That variable is not needed.

Thanks,

Aaron

Super User
Posts: 10,323

Re: Creating a symmetrical matrix with character variable to show diagonal and off diagonal cells

I think @Astounding get better code.
I don't realize it is a real Cartesian Product .


Occasional Contributor
Posts: 9

Re: Creating a symmetrical matrix with character variable to show diagonal and off diagonal cells

Hello,

 

I am also trying to create a 12x12 symmetrical matrix using 2 column character variables : ID and Disease (categorical, 12 categories)

Each ID may have more than 1 Disease observation

How can I adapt the code above to display (report) a 12x12 matrix (ie. cross-tabulate for co-occurance of the diseases)

 

I have tried the following code, but no matrix displays, just a table with 4 colomns (ID, Disease, count, percent) :

 

data temp;
array x{9999} $ 40 _temporary_;
do i=1 by 1 until(last.ID);
set sasuser.diseasecat;
by ID;
x{i}=Disease;
end;

do m=1 to i;
do n=m to i;
v1=x{m};
v2=x{n};
output;
if n ne m then do;
v2=x{m};
v1=x{n};
output;
end;
end;
end;
keep ID Disease;
run;

proc freq data=temp;
table IdParticipant*Group/out=temp1 list nopercent nocum;
run;

proc transpose data=temp1 out=want(drop=_Smiley Happy;
by ID;
id Disease;
var count;
run;

 

proc print data=temp1;
run;

 

 

Many thanks for your help.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 28 replies
  • 247 views
  • 3 likes
  • 6 in conversation