Help with chi-square proc freq data shaping and table output

Solved
Occasional Contributor
Posts: 17

Help with chi-square proc freq data shaping and table output

Hello,

Statistics novice here.  I have data structured as above.  I would like to perform a Pearson chi-square test comparing generic to brand drug fills by year for each group.  I have done this in the past but it involved some manual reshaping in Excell.

I was hoping someone could help me with the the correct way to use the tables and weight statements for a proc freq.

For each group, I would like a chi-square test comparing generic to brand by year.  I would also like to create a data set with the  p value for each group comparison.

I figured it would be worth checking if there is an easy solution to this.  Any help is appreciated.

Thank you.

Accepted Solutions
Solution
‎01-28-2016 05:22 PM
Super User
Posts: 10,770

Re: Help with chi-square proc freq data shaping and table output

It seems you are building FOUR dimensions of contingency table . Not sure whether the following is what you need.

And you need shape your table like this firstly:

group year  level        weight

A       2014  generic   58228

A      2014   brand     7718

...........

proc freq data=have;

table  group*year*level /cmh chisq;

weight weight;

run;

All Replies
Solution
‎01-28-2016 05:22 PM
Super User
Posts: 10,770

Re: Help with chi-square proc freq data shaping and table output

It seems you are building FOUR dimensions of contingency table . Not sure whether the following is what you need.

And you need shape your table like this firstly:

group year  level        weight

A       2014  generic   58228

A      2014   brand     7718

...........

proc freq data=have;

table  group*year*level /cmh chisq;

weight weight;

run;

Super Contributor
Posts: 625

Re: Help with chi-square proc freq data shaping and table output

Could you please tell me the role of weight statement in following example?

``````proc freq data=have;

table  group*year*level /cmh chisq;

weight weight;

run;
``````
Super User
Posts: 23,700

Re: Help with chi-square proc freq data shaping and table output

@Babloo Please consider looking these up yourself. In this case, find the documentation for PROC FREQ. Find the WEIGHT statement under the syntax.

First statement:

The WEIGHT statement names a numeric variable that provides a weight for each observation in the input data set. The WEIGHT statement is most commonly used to input cell count data. See the section Inputting Frequency Counts for more information. If you use a WEIGHT statement, PROC FREQ assumes that an observation represents n observations, where n is the value of variable. The value of the WEIGHT variable is not required to be an integer.

SAS Super FREQ
Posts: 4,240

Re: Help with chi-square proc freq data shaping and table output

For practical purposes, the WEIGHT statement in PROC FREQ is equivalent to the FREQ statement in other SAS procedures. It enables you to specify cell values by using pre-summarized counts.

Occasional Contributor
Posts: 17

Re: Help with chi-square proc freq data shaping and table output

Thank you Xia, this worked great.  Would you be able to give me any tips on getting the chi sq p values and row percents to a data set?

I am having trouble with the output statement.  I appreciate the help!

Super User
Posts: 10,770

Re: Help with chi-square proc freq data shaping and table output

OK. Here is an example .

``````proc freq data=sashelp.class;
table sex*age/out=want1 chisq outpct ;
output out=want2 chisq cmh;
run;``````
Super User
Posts: 23,700

Re: Help with chi-square proc freq data shaping and table output

Can you please include your data as text, not a picture? SAS can't process pictures as data yet

To reshape your data you can use SET.

``````data reshaped;
set have (drop=Generic in=a)
have (drop=Brand in=b);

if a then type='Generic';
if b then type='Brand';

run;

``````

You can then pass that on to proc freq. It will give you results but not a table that summarizes all of it together with the p-values. That will take a bit of data processing.

Occasional Contributor
Posts: 17

Re: Help with chi-square proc freq data shaping and table output

Group year Generic Brand Formulary Nonformulary
A 2014 58228 7718 67268 1086
A 2015 102774 12742 117288 2002
B 2014 179285 22679 202957 4305
B 2015 390531 50278 445166 8366
C 2014 125901 12782 138908 2970
C 2015 271305 32094 308040 5487
D 2014 25330 3118 28480 632
D 2015 38558 5000 43546 958
E 2014 86972 13268 101822 1619
E 2015 149321 20535 173858 2497
F 2014 174173 23246 199510 4311
F 2015 268654 34664 308150 6730
G 2014 9590 694 10388 150
G 2015 29546 2814 32722 372
H 2014 93515 12184 108641 2046
H 2015 144530 18106 168982 2300
I 2014 255844 29236 289018 4176
I 2015 465108 53090 525446 7672
J 2014 53327 5758 59626 1123
J 2015 119858 13810 135540 2544
K 2014 321911 44123 374962 6459
K 2015 488206 63418 574304 6911
L 2014 570029 77791 656689 15829
L 2015 905612 114412 1040012 22253
M 2014 19362 1940 21366 526
M 2015 31838 3156 34942 942
N 2014 222396 31044 257076 6665
N 2015 347872 43048 400573 7373

Hello Reeza, thank you for the response.  Here is the data as plain text (hope this is the correct way to share).

I am having trouble with the code you gave me in reshaping the data.  It did not combine the fill counts into one column.

I am also going to run the chi-square test comparing formulary and nonformulary fills.  Thank you for your help.

Super User
Posts: 23,700

Re: Help with chi-square proc freq data shaping and table output

``````data reshaped;
set have (drop=Generic in=a rename=Brand=Value)
have (drop=Brand in=b rename=Generic=Value);

if a then type='Generic';
if b then type='Brand';

run;``````

There was a mistake in the code, need to name the values the same thing. See if the above works.

Occasional Contributor
Posts: 17

Re: Help with chi-square proc freq data shaping and table output

still could not get that to work...it cant find the dropped variable.

Super User
Posts: 23,700

Re: Help with chi-square proc freq data shaping and table output

Then your variable names don't match what you've been showing - most likely its a label instead of the actual variable name. Make sure the variable name matches your data set variables.

It works on your sample data, ignoring the formulary/nonformulary. If you want those two you can expand the example or just use a proc transpose. See new code at the bottom. Then you can run your proc freqs.

``````
data have;
input Group \$ year Generic Brand Formulary Nonformulary;
cards;
A 2014 58228 7718 67268 1086
A 2015 102774 12742 117288 2002
B 2014 179285 22679 202957 4305
B 2015 390531 50278 445166 8366
C 2014 125901 12782 138908 2970
C 2015 271305 32094 308040 5487
D 2014 25330 3118 28480 632
D 2015 38558 5000 43546 958
E 2014 86972 13268 101822 1619
E 2015 149321 20535 173858 2497
F 2014 174173 23246 199510 4311
F 2015 268654 34664 308150 6730
G 2014 9590 694 10388 150
G 2015 29546 2814 32722 372
H 2014 93515 12184 108641 2046
H 2015 144530 18106 168982 2300
I 2014 255844 29236 289018 4176
I 2015 465108 53090 525446 7672
J 2014 53327 5758 59626 1123
J 2015 119858 13810 135540 2544
K 2014 321911 44123 374962 6459
K 2015 488206 63418 574304 6911
L 2014 570029 77791 656689 15829
L 2015 905612 114412 1040012 22253
M 2014 19362 1940 21366 526
M 2015 31838 3156 34942 942
N 2014 222396 31044 257076 6665
N 2015 347872 43048 400573 7373
;
run;

data reshaped;
set have (drop=Generic in=a rename=Brand=Value)
have (drop=Brand in=b rename=Generic=Value);

if a then type='Generic';
if b then type='Brand';

run;
``````
``````proc transpose data=have out=reshaped2(rename=(col1=Value _name_= Type));
by group year;
var generic brand formulary Nonformulary;
run;``````

This is the PROC TRANSPOSE way to reshape your data.

🔒 This topic is solved and locked.