BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ADouglas
Obsidian | Level 7

Hi SAS Community,

 

I have a dataset with multiple rows per a unique id. Each row for a given id corresponds to a different level of a qualitative variable called Practice_Area. There are only four practice areas permitted, which are A, B, C, D. See below:

 

id        Practice_Area

123            A

123            B

123            C

456            A

456            B

456            C

456            D

789            C

789            D

901            B

901            C

901            D

 

I would like to create a 4x4 symmetrical matrix using SAS to show the number of observations in the off diagonal and diagonal cells. See below:

      A  B  C  D

A   2   2  2

B   2   3  3   2 

C   2   3  4   3

D        2  3   3

 

Can anyone post some SAS code to show how this can be done? Thanks !!

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
input id        Practice_Area $;
cards;
123            A
123            B
123            C
456            A
456            B
456            C
456            D
789            C
789            D
901            B
901            C
901            D
;
run;

data temp;
array x{9999} $ 40 _temporary_;
do i=1 by 1 until(last.id);
  set have;
  by id;
 x{i}=Practice_Area;
end;

do m=1 to i;
 do n=m to i;
   v1=x{m};
   v2=x{n};
   output;
   if n ne m then do;
     v2=x{m};
     v1=x{n};
     output;
   end;
 end;
end;
keep id v1 v2;
run;

proc freq data=temp noprint;
 table v1*v2/out=temp1 list nopercent nocum;
run;

proc transpose data=temp1 out=want(drop=_:);
by v1;
id v2;
var count;
run;



View solution in original post

28 REPLIES 28
Reeza
Super User

1. Change your data structure via PROC TRANSPOSE to indicate what variables are present in each using 0/1 -> make it look like my test data below.

2. Use PROC CORR.

data test;
input id v1-v3;
cards;
11 1 0 1
21 0 2 1
31 1 2 0
41 1 1 0
;
ods output sscp=coocs;
proc corr data=test sscp;
var v1-v3;
run;
proc print data=coocs;
run;

 

ADouglas
Obsidian | Level 7

Thanks, I will try this soon, and let you know if it works.

 

Regards,

Aaron

error_prone
Barite | Level 11

Maybe it's just the time of day, 10 pm in some minutes, but i don't understand the rules to construct the rows. 

Do you want the result as dataset or report?

ADouglas
Obsidian | Level 7

Thanks,

 

As a report. The cells indicate the number of observations.

 

Regards,

Aaron

Reeza
Super User

@ADouglas wrote:

The cells indicate the number of observations.

 

 


 Of what? I'm assuming that means how many ID's have both, ie how many ID's have A, how many ID's have AB, how many ID's have AC etc... primarily all 2 way values. 

 

I'm guessing the question is something like how many people played on X and Y practice area. 

 

But I'm guessing...and I don't like to guess 🙂

ADouglas
Obsidian | Level 7

Yes. You are right.

 

How many in AA, how many in AB, how many in BB, etc.

 

Thanks,

Aaron

Astounding
PROC Star

I can't test this until Monday, but I think it gets you there:

 

proc sql noprint;

create table pairs as select a.Practice_Area as PA_1, b.Practice_Area as PA_2

from have a, have b

where a.id = b.id ;

quit;

 

 

proc freq data=pairs;

tables pa_1 * pa_2 / norow nocol nopercent;

run;

 

TESTING RESULTS:

 

This worked.  Got the same results as the solution program.

 

Other considerations ...

 

This program generates a report.  The solution program generates a data set.

 

This program is probably not scalable.  The time it would take to generate the Cartesian product on 60K observations might be prohibitive (only one way to find out).

 

The solution program turns Practice Area values into variable names.  Given the possibility that Practice Area names may be longer than 32 characters, there may be issues to consider (particularly if some Practice Areas are identical for the first 32 characters, but different after that point).

ADouglas
Obsidian | Level 7

I like this idea. I would like to see if you can get it to work. I will try it too. I never used proc sql statements before. It will be my first. Thanks.

ADouglas
Obsidian | Level 7

Very nice approach. I like both solutions a great deal.

 

Thanks,

Aaron

Ksharp
Super User
How about this one. But you need SAS/IML.
Otherwise use data step would cost you a lot code.



data have;
input id        Practice_Area $;
cards;
123            A
123            B
123            C
456            A
456            B
456            C
456            D
789            C
789            D
901            B
901            C
901            D
;
run;

data temp;
array x{9999} $ 40 _temporary_;
do i=1 by 1 until(last.id);
  set have;
  by id;
 x{i}=Practice_Area;
end;

do m=1 to i;
 do n=m to i;
   v1=x{m};
   v2=x{n};
   output;
 end;
end;
keep id v1 v2;
run;

proc freq data=temp noprint;
 table v1*v2/out=temp1 list nopercent nocum;
run;

proc transpose data=temp1 out=want(drop=_:);
by v1;
id v2;
var count;
run;

proc iml;
use want;
read all var _num_ into x[c=vname];
read all var {v1};
close;

diag=diag(vecdiag(x));
want=coalesce(t(x),0)+coalesce(x,0)-diag;
print want[c=vname r=v1];
quit;

Ksharp
Super User
data have;
input id        Practice_Area $;
cards;
123            A
123            B
123            C
456            A
456            B
456            C
456            D
789            C
789            D
901            B
901            C
901            D
;
run;

data temp;
array x{9999} $ 40 _temporary_;
do i=1 by 1 until(last.id);
  set have;
  by id;
 x{i}=Practice_Area;
end;

do m=1 to i;
 do n=m to i;
   v1=x{m};
   v2=x{n};
   output;
   if n ne m then do;
     v2=x{m};
     v1=x{n};
     output;
   end;
 end;
end;
keep id v1 v2;
run;

proc freq data=temp noprint;
 table v1*v2/out=temp1 list nopercent nocum;
run;

proc transpose data=temp1 out=want(drop=_:);
by v1;
id v2;
var count;
run;



ADouglas
Obsidian | Level 7

This approach provides the requested solution, but the only drawback is the Percent of Total Frequency variable on the Temp1 dataset. That variable is not needed.

Thanks,

Aaron

Ksharp
Super User
I think @Astounding get better code.
I don't realize it is a real Cartesian Product .


dmarks
Calcite | Level 5

Hello,

 

I am also trying to create a 12x12 symmetrical matrix using 2 column character variables : ID and Disease (categorical, 12 categories)

Each ID may have more than 1 Disease observation

How can I adapt the code above to display (report) a 12x12 matrix (ie. cross-tabulate for co-occurance of the diseases)

 

I have tried the following code, but no matrix displays, just a table with 4 colomns (ID, Disease, count, percent) :

 

data temp;
array x{9999} $ 40 _temporary_;
do i=1 by 1 until(last.ID);
set sasuser.diseasecat;
by ID;
x{i}=Disease;
end;

do m=1 to i;
do n=m to i;
v1=x{m};
v2=x{n};
output;
if n ne m then do;
v2=x{m};
v1=x{n};
output;
end;
end;
end;
keep ID Disease;
run;

proc freq data=temp;
table IdParticipant*Group/out=temp1 list nopercent nocum;
run;

proc transpose data=temp1 out=want(drop=_:);
by ID;
id Disease;
var count;
run;

 

proc print data=temp1;
run;

 

 

Many thanks for your help.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 28 replies
  • 1225 views
  • 3 likes
  • 6 in conversation