turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Creating a symmetrical matrix with character varia...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-20-2017 03:36 PM - edited 10-20-2017 03:38 PM

Hi SAS Community,

I have a dataset with multiple rows per a unique id. Each row for a given id corresponds to a different level of a qualitative variable called Practice_Area. There are only four practice areas permitted, which are A, B, C, D. See below:

id Practice_Area

123 A

123 B

123 C

456 A

456 B

456 C

456 D

789 C

789 D

901 B

901 C

901 D

I would like to create a 4x4 symmetrical matrix using SAS to show the number of observations in the off diagonal and diagonal cells. See below:

A B C D

A 2 2 2

B 2 3 3 2

C 2 3 4 3

D 2 3 3

Can anyone post some SAS code to show how this can be done? Thanks !!

Accepted Solutions

Solution

10-21-2017
10:19 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ADouglas

10-21-2017 08:55 AM

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ADouglas

10-20-2017 03:52 PM

1. Change your data structure via PROC TRANSPOSE to indicate what variables are present in each using 0/1 -> make it look like my test data below.

2. Use PROC CORR.

```
data test;
input id v1-v3;
cards;
11 1 0 1
21 0 2 1
31 1 2 0
41 1 1 0
;
ods output sscp=coocs;
proc corr data=test sscp;
var v1-v3;
run;
proc print data=coocs;
run;
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

10-20-2017 04:07 PM

Thanks, I will try this soon, and let you know if it works.

Regards,

Aaron

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ADouglas

10-20-2017 03:52 PM

Maybe it's just the time of day, 10 pm in some minutes, but i don't understand the rules to construct the rows.

Do you want the result as dataset or report?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to error_prone

10-20-2017 03:56 PM

Thanks,

As a report. The cells indicate the number of observations.

Regards,

Aaron

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ADouglas

10-20-2017 04:03 PM

ADouglas wrote:

The cells indicate the number of observations.

Of what? I'm assuming that means how many ID's have both, ie how many ID's have A, how many ID's have AB, how many ID's have AC etc... primarily all 2 way values.

I'm guessing the question is something like how many people played on X and Y practice area.

But I'm guessing...and I don't like to guess

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

10-20-2017 04:07 PM

Yes. You are right.

How many in AA, how many in AB, how many in BB, etc.

Thanks,

Aaron

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ADouglas

10-20-2017 09:48 PM - edited 10-23-2017 11:18 AM

I can't test this until Monday, but I think it gets you there:

proc sql noprint;

create table pairs as select a.Practice_Area as PA_1, b.Practice_Area as PA_2

from have a, have b

where a.id = b.id ;

quit;

proc freq data=pairs;

tables pa_1 * pa_2 / norow nocol nopercent;

run;

TESTING RESULTS:

This worked. Got the same results as the solution program.

Other considerations ...

This program generates a report. The solution program generates a data set.

This program is probably not scalable. The time it would take to generate the Cartesian product on 60K observations might be prohibitive (only one way to find out).

The solution program turns Practice Area values into variable names. Given the possibility that Practice Area names may be longer than 32 characters, there may be issues to consider (particularly if some Practice Areas are identical for the first 32 characters, but different after that point).

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Astounding

10-21-2017 10:22 AM

I like this idea. I would like to see if you can get it to work. I will try it too. I never used proc sql statements before. It will be my first. Thanks.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Astounding

10-23-2017 12:13 PM

Very nice approach. I like both solutions a great deal.

Thanks,

Aaron

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ADouglas

10-21-2017 08:48 AM

How about this one. But you need SAS/IML. Otherwise use data step would cost you a lot code. data have; input id Practice_Area $; cards; 123 A 123 B 123 C 456 A 456 B 456 C 456 D 789 C 789 D 901 B 901 C 901 D ; run; data temp; array x{9999} $ 40 _temporary_; do i=1 by 1 until(last.id); set have; by id; x{i}=Practice_Area; end; do m=1 to i; do n=m to i; v1=x{m}; v2=x{n}; output; end; end; keep id v1 v2; run; proc freq data=temp noprint; table v1*v2/out=temp1 list nopercent nocum; run; proc transpose data=temp1 out=want(drop=_:); by v1; id v2; var count; run; proc iml; use want; read all var _num_ into x[c=vname]; read all var {v1}; close; diag=diag(vecdiag(x)); want=coalesce(t(x),0)+coalesce(x,0)-diag; print want[c=vname r=v1]; quit;

Solution

10-21-2017
10:19 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ADouglas

10-21-2017 08:55 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

10-23-2017 12:23 PM

This approach provides the requested solution, but the only drawback is the Percent of Total Frequency variable on the Temp1 dataset. That variable is not needed.

Thanks,

Aaron

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ADouglas

10-24-2017 08:39 AM

I think @Astounding get better code. I don't realize it is a real Cartesian Product .

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

10-24-2017 11:13 AM

Hello,

I am also trying to create a 12x12 symmetrical matrix using 2 column character variables : ID and Disease (categorical, 12 categories)

Each ID may have more than 1 Disease observation

How can I adapt the code above to display (report) a 12x12 matrix (ie. cross-tabulate for co-occurance of the diseases)

I have tried the following code, but no matrix displays, just a table with 4 colomns (ID, Disease, count, percent) :

data temp;

array x{9999} $ 40 _temporary_;

do i=1 by 1 until(last.**ID**);

set sasuser.diseasecat;

by ID;

x{i}=**Disease**;

end;

do m=1 to i;

do n=m to i;

v1=x{m};

v2=x{n};

output;

if n ne m then do;

v2=x{m};

v1=x{n};

output;

end;

end;

end;

keep **ID Disease**;

run;

proc freq data=temp;

table IdParticipant*Group/out=temp1 list nopercent nocum;

run;

proc transpose data=temp1 out=want(drop=_;

by **ID**;

id **Disease**;

var count;

run;

proc print data=temp1;

run;

Many thanks for your help.