Hi fellow SAS users,
I am trying to create 2x2 tables with Cramer's V to assess correlation of categorical variables. It is a very large table with many columns, but for this example, lets say:
I have a table consisting of 5 columns called A, B, C, D, and E. I want to output a proc freq with Cramer's V for each combination. I know I can do something like:
proc freq data=dataset; table (A--E)*(A--E) / chisq; run;But I don't want duplicates (i.e., I do not need both the A*B and B*A crosstabs). I do not have much experience writing arrays, but I think it should be possible to accomplish this task using one. The real table has 30+ columns, so typing out all the combinations to avoid duplications isn't practical.
Thanks in advance for any help the community can provide! 😁
I would re-arrange the data such that you can use a BY statement in PROC FREQ. (Alternative is to use a macro)
data have;
    input a b c d e;
    cards;
7 10 11 12 14
7 11 10 12 13
8 10 12 13 14
;
data re_arrange;
    set have;
    array v a--e;
    do i=1 to dim(v)-1;
        do j=i+1 to dim(v);
            value1=v(i);
            value2=v(j);
            type = cats(vname(v(i)),'*',vname(v(j)));
            output;
        end;
    end;
    drop i j;
run;
proc sort data=re_arrange;
    by type;
run;
proc freq data=re_arrange;
    by type;
    table value1*value2/chisq;
run;
Sending this to a text or html output would produce a very long report where you need just a tiny fraction of the results to get Cramer's V. Better would be sending the statistics to a data set, called CHISQ and then filtering the row so you get only the cases where STATISTIC="Cramer's V". Replace the PROC FREQ with this:
ods select none;
ods output chisq=chisq(where=(statistic=:'Cram') drop=table);
proc freq data=re_arrange;
    by type;
    table value1*value2/chisq;
run;
If all the variables have labels (which I would strongly recommend), you could then use the VLABEL() function instread of the VNAME() function.
I would re-arrange the data such that you can use a BY statement in PROC FREQ. (Alternative is to use a macro)
data have;
    input a b c d e;
    cards;
7 10 11 12 14
7 11 10 12 13
8 10 12 13 14
;
data re_arrange;
    set have;
    array v a--e;
    do i=1 to dim(v)-1;
        do j=i+1 to dim(v);
            value1=v(i);
            value2=v(j);
            type = cats(vname(v(i)),'*',vname(v(j)));
            output;
        end;
    end;
    drop i j;
run;
proc sort data=re_arrange;
    by type;
run;
proc freq data=re_arrange;
    by type;
    table value1*value2/chisq;
run;
Sending this to a text or html output would produce a very long report where you need just a tiny fraction of the results to get Cramer's V. Better would be sending the statistics to a data set, called CHISQ and then filtering the row so you get only the cases where STATISTIC="Cramer's V". Replace the PROC FREQ with this:
ods select none;
ods output chisq=chisq(where=(statistic=:'Cram') drop=table);
proc freq data=re_arrange;
    by type;
    table value1*value2/chisq;
run;
If all the variables have labels (which I would strongly recommend), you could then use the VLABEL() function instread of the VNAME() function.
@irvinery wrote:
Hi fellow SAS users,
I am trying to create 2x2 tables with Cramer's V to assess correlation of categorical variables. It is a very large table with many columns, but for this example, lets say:
I have a table consisting of 5 columns called A, B, C, D, and E. I want to output a proc freq with Cramer's V for each combination. I know I can do something like:
proc freq data=dataset; table (A--E)*(A--E) / chisq; run;But I don't want duplicates (i.e., I do not need both the A*B and B*A crosstabs). I do not have much experience writing arrays, but I think it should be possible to accomplish this task using one. The real table has 30+ columns, so typing out all the combinations to avoid duplications isn't practical.
Thanks in advance for any help the community can provide! 😁
Since Arrays only exist for the duration of a data step or Proc Iml I am not sure how you are thinking of using this.
Unless you are thinking of placing the the names of the variables into an array and using one of the combinatorial functions to write the code.
@irvinery it seems to me it is wrong to mark that as the correct answer, as I have already shown how to use arrays in this problem.
You can change the selection easily: Select a different post as the solution after clicking "Not the Solution" in the option menu (see icon below) of the current solution.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
