BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jse
Fluorite | Level 6 Jse
Fluorite | Level 6

Chi square data.PNG

 

Hello,

 

Statistics novice here.  I have data structured as above.  I would like to perform a Pearson chi-square test comparing generic to brand drug fills by year for each group.  I have done this in the past but it involved some manual reshaping in Excell.  

 

I was hoping someone could help me with the the correct way to use the tables and weight statements for a proc freq.

 

For each group, I would like a chi-square test comparing generic to brand by year.  I would also like to create a data set with the  p value for each group comparison.

 

I figured it would be worth checking if there is an easy solution to this.  Any help is appreciated.

 

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

It seems you are building FOUR dimensions of contingency table . Not sure whether the following is what you need.

And you need shape your table like this firstly:

 

group year  level        weight

A       2014  generic   58228

A      2014   brand     7718

...........

 

 

 

proc freq data=have;

table  group*year*level /cmh chisq;

weight weight;

run;

View solution in original post

11 REPLIES 11
Ksharp
Super User

It seems you are building FOUR dimensions of contingency table . Not sure whether the following is what you need.

And you need shape your table like this firstly:

 

group year  level        weight

A       2014  generic   58228

A      2014   brand     7718

...........

 

 

 

proc freq data=have;

table  group*year*level /cmh chisq;

weight weight;

run;

Babloo
Rhodochrosite | Level 12

Could you please tell me the role of weight statement in following example?

 

proc freq data=have;

table  group*year*level /cmh chisq;

weight weight;

run;
Reeza
Super User

@Babloo Please consider looking these up yourself. In this case, find the documentation for PROC FREQ. Find the WEIGHT statement under the syntax. 

 

First statement:

 

The WEIGHT statement names a numeric variable that provides a weight for each observation in the input data set. The WEIGHT statement is most commonly used to input cell count data. See the section Inputting Frequency Counts for more information. If you use a WEIGHT statement, PROC FREQ assumes that an observation represents n observations, where n is the value of variable. The value of the WEIGHT variable is not required to be an integer.

 

 

Rick_SAS
SAS Super FREQ

For practical purposes, the WEIGHT statement in PROC FREQ is equivalent to the FREQ statement in other SAS procedures. It enables you to specify cell values by using pre-summarized counts.

Jse
Fluorite | Level 6 Jse
Fluorite | Level 6

Thank you Xia, this worked great.  Would you be able to give me any tips on getting the chi sq p values and row percents to a data set?

 

I am having trouble with the output statement.  I appreciate the help!

 

 

 

Ksharp
Super User

OK. Here is an example .

 

proc freq data=sashelp.class;
table sex*age/out=want1 chisq outpct ;
output out=want2 chisq cmh;
run;
Reeza
Super User

Can you please include your data as text, not a picture? SAS can't process pictures as data yet 😉

 

To reshape your data you can use SET.

 

data reshaped;
set have (drop=Generic in=a)
      have (drop=Brand in=b);

if a then type='Generic';
if b then type='Brand';

run;

You can then pass that on to proc freq. It will give you results but not a table that summarizes all of it together with the p-values. That will take a bit of data processing.

Jse
Fluorite | Level 6 Jse
Fluorite | Level 6

Group year Generic Brand Formulary Nonformulary
A 2014 58228 7718 67268 1086
A 2015 102774 12742 117288 2002
B 2014 179285 22679 202957 4305
B 2015 390531 50278 445166 8366
C 2014 125901 12782 138908 2970
C 2015 271305 32094 308040 5487
D 2014 25330 3118 28480 632
D 2015 38558 5000 43546 958
E 2014 86972 13268 101822 1619
E 2015 149321 20535 173858 2497
F 2014 174173 23246 199510 4311
F 2015 268654 34664 308150 6730
G 2014 9590 694 10388 150
G 2015 29546 2814 32722 372
H 2014 93515 12184 108641 2046
H 2015 144530 18106 168982 2300
I 2014 255844 29236 289018 4176
I 2015 465108 53090 525446 7672
J 2014 53327 5758 59626 1123
J 2015 119858 13810 135540 2544
K 2014 321911 44123 374962 6459
K 2015 488206 63418 574304 6911
L 2014 570029 77791 656689 15829
L 2015 905612 114412 1040012 22253
M 2014 19362 1940 21366 526
M 2015 31838 3156 34942 942
N 2014 222396 31044 257076 6665
N 2015 347872 43048 400573 7373

 

Hello Reeza, thank you for the response.  Here is the data as plain text (hope this is the correct way to share).

 

I am having trouble with the code you gave me in reshaping the data.  It did not combine the fill counts into one column.   

 

I am also going to run the chi-square test comparing formulary and nonformulary fills.  Thank you for your help.

 

 

Reeza
Super User
data reshaped;
set have (drop=Generic in=a rename=Brand=Value)
      have (drop=Brand in=b rename=Generic=Value);

if a then type='Generic';
if b then type='Brand';

run;

 

There was a mistake in the code, need to name the values the same thing. See if the above works. 

Jse
Fluorite | Level 6 Jse
Fluorite | Level 6

still could not get that to work...it cant find the dropped variable.

Reeza
Super User

Then your variable names don't match what you've been showing - most likely its a label instead of the actual variable name. Make sure the variable name matches your data set variables.

 

It works on your sample data, ignoring the formulary/nonformulary. If you want those two you can expand the example or just use a proc transpose. See new code at the bottom. Then you can run your proc freqs.

 

 


data have;
input Group $ year Generic Brand Formulary Nonformulary;
cards;
A 2014 58228 7718 67268 1086
A 2015 102774 12742 117288 2002
B 2014 179285 22679 202957 4305
B 2015 390531 50278 445166 8366
C 2014 125901 12782 138908 2970
C 2015 271305 32094 308040 5487
D 2014 25330 3118 28480 632
D 2015 38558 5000 43546 958
E 2014 86972 13268 101822 1619
E 2015 149321 20535 173858 2497
F 2014 174173 23246 199510 4311
F 2015 268654 34664 308150 6730
G 2014 9590 694 10388 150
G 2015 29546 2814 32722 372
H 2014 93515 12184 108641 2046
H 2015 144530 18106 168982 2300
I 2014 255844 29236 289018 4176
I 2015 465108 53090 525446 7672
J 2014 53327 5758 59626 1123
J 2015 119858 13810 135540 2544
K 2014 321911 44123 374962 6459
K 2015 488206 63418 574304 6911
L 2014 570029 77791 656689 15829
L 2015 905612 114412 1040012 22253
M 2014 19362 1940 21366 526
M 2015 31838 3156 34942 942
N 2014 222396 31044 257076 6665
N 2015 347872 43048 400573 7373
;
run;

data reshaped;
set have (drop=Generic in=a rename=Brand=Value)
      have (drop=Brand in=b rename=Generic=Value);

if a then type='Generic';
if b then type='Brand';

run;
proc transpose data=have out=reshaped2(rename=(col1=Value _name_= Type));
by group year;
var generic brand formulary Nonformulary;
run;

This is the PROC TRANSPOSE way to reshape your data.

 

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1327 views
  • 0 likes
  • 5 in conversation