BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
irvinery
Fluorite | Level 6

Hi fellow SAS users,

 

I am trying to create 2x2 tables with Cramer's V to assess correlation of categorical variables. It is a very large table with many columns, but for this example, lets say:

I have a table consisting of 5 columns called A, B, C, D, and E. I want to output a proc freq with Cramer's V for each combination. I know I can do something like:

proc freq data=dataset; table (A--E)*(A--E) / chisq; run;

But I don't want duplicates (i.e., I do not need both the A*B and B*A crosstabs). I do not have much experience writing arrays, but I think it should be possible to accomplish this task using one. The real table has 30+ columns, so typing out all the combinations to avoid duplications isn't practical. 

 

Thanks in advance for any help the community can provide! 😁

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

I would re-arrange the data such that you can use a BY statement in PROC FREQ. (Alternative is to use a macro)

 

data have;
    input a b c d e;
    cards;
7 10 11 12 14
7 11 10 12 13
8 10 12 13 14
;

data re_arrange;
    set have;
    array v a--e;
    do i=1 to dim(v)-1;
        do j=i+1 to dim(v);
            value1=v(i);
            value2=v(j);
            type = cats(vname(v(i)),'*',vname(v(j)));
            output;
        end;
    end;
    drop i j;
run;
proc sort data=re_arrange;
    by type;
run;

proc freq data=re_arrange;
    by type;
    table value1*value2/chisq;
run;

 

 

Sending this to a text or html output would produce a very long report where you need just a tiny fraction of the results to get Cramer's V. Better would be sending the statistics to a data set, called CHISQ and then filtering the row so you get only the cases where STATISTIC="Cramer's V". Replace the PROC FREQ with this:

 

ods select none;
ods output chisq=chisq(where=(statistic=:'Cram') drop=table);
proc freq data=re_arrange;
    by type;
    table value1*value2/chisq;
run;

 

 

If all the variables have labels (which I would strongly recommend), you could then use the VLABEL() function instread of the VNAME() function.

--
Paige Miller

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26

I would re-arrange the data such that you can use a BY statement in PROC FREQ. (Alternative is to use a macro)

 

data have;
    input a b c d e;
    cards;
7 10 11 12 14
7 11 10 12 13
8 10 12 13 14
;

data re_arrange;
    set have;
    array v a--e;
    do i=1 to dim(v)-1;
        do j=i+1 to dim(v);
            value1=v(i);
            value2=v(j);
            type = cats(vname(v(i)),'*',vname(v(j)));
            output;
        end;
    end;
    drop i j;
run;
proc sort data=re_arrange;
    by type;
run;

proc freq data=re_arrange;
    by type;
    table value1*value2/chisq;
run;

 

 

Sending this to a text or html output would produce a very long report where you need just a tiny fraction of the results to get Cramer's V. Better would be sending the statistics to a data set, called CHISQ and then filtering the row so you get only the cases where STATISTIC="Cramer's V". Replace the PROC FREQ with this:

 

ods select none;
ods output chisq=chisq(where=(statistic=:'Cram') drop=table);
proc freq data=re_arrange;
    by type;
    table value1*value2/chisq;
run;

 

 

If all the variables have labels (which I would strongly recommend), you could then use the VLABEL() function instread of the VNAME() function.

--
Paige Miller
ballardw
Super User

@irvinery wrote:

Hi fellow SAS users,

 

I am trying to create 2x2 tables with Cramer's V to assess correlation of categorical variables. It is a very large table with many columns, but for this example, lets say:

I have a table consisting of 5 columns called A, B, C, D, and E. I want to output a proc freq with Cramer's V for each combination. I know I can do something like:

proc freq data=dataset; table (A--E)*(A--E) / chisq; run;

But I don't want duplicates (i.e., I do not need both the A*B and B*A crosstabs). I do not have much experience writing arrays, but I think it should be possible to accomplish this task using one. The real table has 30+ columns, so typing out all the combinations to avoid duplications isn't practical. 

 

Thanks in advance for any help the community can provide! 😁


Since Arrays only exist for the duration of a data step or Proc Iml I am not sure how you are thinking of using this.

Unless you are thinking of placing the the names of the variables into an array and using one of the combinatorial functions to write the code.

PaigeMiller
Diamond | Level 26

@irvinery it seems to me it is wrong to mark that as the correct answer, as I have already shown how to use arrays in this problem.

--
Paige Miller
irvinery
Fluorite | Level 6
Hi Paige,
I didn't select that as the right answer, so I"m not sure how that happened.
FreelanceReinh
Jade | Level 19

You can change the selection easily: Select a different post as the solution after clicking "Not the Solution" in the option menu (see icon below) of the current solution.
show_option_menu.png

irvinery
Fluorite | Level 6
Done! Thanks again for the help.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1077 views
  • 1 like
  • 4 in conversation