Hi SAS Community,
I have a dataset with multiple rows per a unique id. Each row for a given id corresponds to a different level of a qualitative variable called Practice_Area. There are only four practice areas permitted, which are A, B, C, D. See below:
id Practice_Area
123 A
123 B
123 C
456 A
456 B
456 C
456 D
789 C
789 D
901 B
901 C
901 D
I would like to create a 4x4 symmetrical matrix using SAS to show the number of observations in the off diagonal and diagonal cells. See below:
A B C D
A 2 2 2
B 2 3 3 2
C 2 3 4 3
D 2 3 3
Can anyone post some SAS code to show how this can be done? Thanks !!
data have; input id Practice_Area $; cards; 123 A 123 B 123 C 456 A 456 B 456 C 456 D 789 C 789 D 901 B 901 C 901 D ; run; data temp; array x{9999} $ 40 _temporary_; do i=1 by 1 until(last.id); set have; by id; x{i}=Practice_Area; end; do m=1 to i; do n=m to i; v1=x{m}; v2=x{n}; output; if n ne m then do; v2=x{m}; v1=x{n}; output; end; end; end; keep id v1 v2; run; proc freq data=temp noprint; table v1*v2/out=temp1 list nopercent nocum; run; proc transpose data=temp1 out=want(drop=_:); by v1; id v2; var count; run;
1. Change your data structure via PROC TRANSPOSE to indicate what variables are present in each using 0/1 -> make it look like my test data below.
2. Use PROC CORR.
data test;
input id v1-v3;
cards;
11 1 0 1
21 0 2 1
31 1 2 0
41 1 1 0
;
ods output sscp=coocs;
proc corr data=test sscp;
var v1-v3;
run;
proc print data=coocs;
run;
Thanks, I will try this soon, and let you know if it works.
Regards,
Aaron
Maybe it's just the time of day, 10 pm in some minutes, but i don't understand the rules to construct the rows.
Do you want the result as dataset or report?
Thanks,
As a report. The cells indicate the number of observations.
Regards,
Aaron
@ADouglas wrote:
The cells indicate the number of observations.
Of what? I'm assuming that means how many ID's have both, ie how many ID's have A, how many ID's have AB, how many ID's have AC etc... primarily all 2 way values.
I'm guessing the question is something like how many people played on X and Y practice area.
But I'm guessing...and I don't like to guess 🙂
Yes. You are right.
How many in AA, how many in AB, how many in BB, etc.
Thanks,
Aaron
I can't test this until Monday, but I think it gets you there:
proc sql noprint;
create table pairs as select a.Practice_Area as PA_1, b.Practice_Area as PA_2
from have a, have b
where a.id = b.id ;
quit;
proc freq data=pairs;
tables pa_1 * pa_2 / norow nocol nopercent;
run;
TESTING RESULTS:
This worked. Got the same results as the solution program.
Other considerations ...
This program generates a report. The solution program generates a data set.
This program is probably not scalable. The time it would take to generate the Cartesian product on 60K observations might be prohibitive (only one way to find out).
The solution program turns Practice Area values into variable names. Given the possibility that Practice Area names may be longer than 32 characters, there may be issues to consider (particularly if some Practice Areas are identical for the first 32 characters, but different after that point).
I like this idea. I would like to see if you can get it to work. I will try it too. I never used proc sql statements before. It will be my first. Thanks.
Very nice approach. I like both solutions a great deal.
Thanks,
Aaron
How about this one. But you need SAS/IML. Otherwise use data step would cost you a lot code. data have; input id Practice_Area $; cards; 123 A 123 B 123 C 456 A 456 B 456 C 456 D 789 C 789 D 901 B 901 C 901 D ; run; data temp; array x{9999} $ 40 _temporary_; do i=1 by 1 until(last.id); set have; by id; x{i}=Practice_Area; end; do m=1 to i; do n=m to i; v1=x{m}; v2=x{n}; output; end; end; keep id v1 v2; run; proc freq data=temp noprint; table v1*v2/out=temp1 list nopercent nocum; run; proc transpose data=temp1 out=want(drop=_:); by v1; id v2; var count; run; proc iml; use want; read all var _num_ into x[c=vname]; read all var {v1}; close; diag=diag(vecdiag(x)); want=coalesce(t(x),0)+coalesce(x,0)-diag; print want[c=vname r=v1]; quit;
data have; input id Practice_Area $; cards; 123 A 123 B 123 C 456 A 456 B 456 C 456 D 789 C 789 D 901 B 901 C 901 D ; run; data temp; array x{9999} $ 40 _temporary_; do i=1 by 1 until(last.id); set have; by id; x{i}=Practice_Area; end; do m=1 to i; do n=m to i; v1=x{m}; v2=x{n}; output; if n ne m then do; v2=x{m}; v1=x{n}; output; end; end; end; keep id v1 v2; run; proc freq data=temp noprint; table v1*v2/out=temp1 list nopercent nocum; run; proc transpose data=temp1 out=want(drop=_:); by v1; id v2; var count; run;
This approach provides the requested solution, but the only drawback is the Percent of Total Frequency variable on the Temp1 dataset. That variable is not needed.
Thanks,
Aaron
I think @Astounding get better code. I don't realize it is a real Cartesian Product .
Hello,
I am also trying to create a 12x12 symmetrical matrix using 2 column character variables : ID and Disease (categorical, 12 categories)
Each ID may have more than 1 Disease observation
How can I adapt the code above to display (report) a 12x12 matrix (ie. cross-tabulate for co-occurance of the diseases)
I have tried the following code, but no matrix displays, just a table with 4 colomns (ID, Disease, count, percent) :
data temp;
array x{9999} $ 40 _temporary_;
do i=1 by 1 until(last.ID);
set sasuser.diseasecat;
by ID;
x{i}=Disease;
end;
do m=1 to i;
do n=m to i;
v1=x{m};
v2=x{n};
output;
if n ne m then do;
v2=x{m};
v1=x{n};
output;
end;
end;
end;
keep ID Disease;
run;
proc freq data=temp;
table IdParticipant*Group/out=temp1 list nopercent nocum;
run;
proc transpose data=temp1 out=want(drop=_:);
by ID;
id Disease;
var count;
run;
proc print data=temp1;
run;
Many thanks for your help.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.