Fluorite | Level 6

## Create proc freq crosstab for all column combinations in a dataset

Hi fellow SAS users,

I am trying to create 2x2 tables with Cramer's V to assess correlation of categorical variables. It is a very large table with many columns, but for this example, lets say:

I have a table consisting of 5 columns called A, B, C, D, and E. I want to output a proc freq with Cramer's V for each combination. I know I can do something like:

``proc freq data=dataset; table (A--E)*(A--E) / chisq; run;``

But I don't want duplicates (i.e., I do not need both the A*B and B*A crosstabs). I do not have much experience writing arrays, but I think it should be possible to accomplish this task using one. The real table has 30+ columns, so typing out all the combinations to avoid duplications isn't practical.

Thanks in advance for any help the community can provide! 😁

1 ACCEPTED SOLUTION

Accepted Solutions
Diamond | Level 26

## Re: Create proc freq crosstab for all column combinations in a dataset

I would re-arrange the data such that you can use a BY statement in PROC FREQ. (Alternative is to use a macro)

``````data have;
input a b c d e;
cards;
7 10 11 12 14
7 11 10 12 13
8 10 12 13 14
;

data re_arrange;
set have;
array v a--e;
do i=1 to dim(v)-1;
do j=i+1 to dim(v);
value1=v(i);
value2=v(j);
type = cats(vname(v(i)),'*',vname(v(j)));
output;
end;
end;
drop i j;
run;
proc sort data=re_arrange;
by type;
run;

proc freq data=re_arrange;
by type;
table value1*value2/chisq;
run;
``````

Sending this to a text or html output would produce a very long report where you need just a tiny fraction of the results to get Cramer's V. Better would be sending the statistics to a data set, called CHISQ and then filtering the row so you get only the cases where STATISTIC="Cramer's V". Replace the PROC FREQ with this:

``````ods select none;
ods output chisq=chisq(where=(statistic=:'Cram') drop=table);
proc freq data=re_arrange;
by type;
table value1*value2/chisq;
run;``````

If all the variables have labels (which I would strongly recommend), you could then use the VLABEL() function instread of the VNAME() function.

--
Paige Miller
6 REPLIES 6
Diamond | Level 26

## Re: Create proc freq crosstab for all column combinations in a dataset

I would re-arrange the data such that you can use a BY statement in PROC FREQ. (Alternative is to use a macro)

``````data have;
input a b c d e;
cards;
7 10 11 12 14
7 11 10 12 13
8 10 12 13 14
;

data re_arrange;
set have;
array v a--e;
do i=1 to dim(v)-1;
do j=i+1 to dim(v);
value1=v(i);
value2=v(j);
type = cats(vname(v(i)),'*',vname(v(j)));
output;
end;
end;
drop i j;
run;
proc sort data=re_arrange;
by type;
run;

proc freq data=re_arrange;
by type;
table value1*value2/chisq;
run;
``````

Sending this to a text or html output would produce a very long report where you need just a tiny fraction of the results to get Cramer's V. Better would be sending the statistics to a data set, called CHISQ and then filtering the row so you get only the cases where STATISTIC="Cramer's V". Replace the PROC FREQ with this:

``````ods select none;
ods output chisq=chisq(where=(statistic=:'Cram') drop=table);
proc freq data=re_arrange;
by type;
table value1*value2/chisq;
run;``````

If all the variables have labels (which I would strongly recommend), you could then use the VLABEL() function instread of the VNAME() function.

--
Paige Miller
Super User

## Re: Create proc freq crosstab for all column combinations in a dataset

@irvinery wrote:

Hi fellow SAS users,

I am trying to create 2x2 tables with Cramer's V to assess correlation of categorical variables. It is a very large table with many columns, but for this example, lets say:

I have a table consisting of 5 columns called A, B, C, D, and E. I want to output a proc freq with Cramer's V for each combination. I know I can do something like:

``proc freq data=dataset; table (A--E)*(A--E) / chisq; run;``

But I don't want duplicates (i.e., I do not need both the A*B and B*A crosstabs). I do not have much experience writing arrays, but I think it should be possible to accomplish this task using one. The real table has 30+ columns, so typing out all the combinations to avoid duplications isn't practical.

Thanks in advance for any help the community can provide! 😁

Since Arrays only exist for the duration of a data step or Proc Iml I am not sure how you are thinking of using this.

Unless you are thinking of placing the the names of the variables into an array and using one of the combinatorial functions to write the code.

Diamond | Level 26

## Re: Create proc freq crosstab for all column combinations in a dataset

@irvinery it seems to me it is wrong to mark that as the correct answer, as I have already shown how to use arrays in this problem.

--
Paige Miller
Fluorite | Level 6

## Re: Create proc freq crosstab for all column combinations in a dataset

Hi Paige,
I didn't select that as the right answer, so I"m not sure how that happened.

## Re: Create proc freq crosstab for all column combinations in a dataset

You can change the selection easily: Select a different post as the solution after clicking "Not the Solution" in the option menu (see icon below) of the current solution.

Fluorite | Level 6

## Re: Create proc freq crosstab for all column combinations in a dataset

Done! Thanks again for the help.
Discussion stats
• 6 replies
• 523 views
• 1 like
• 4 in conversation