BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
akimme
Obsidian | Level 7

Hi everyone, I'm trying to:

  1. run a chi square on proportions of people who did and did not have a cardiovascular (CVD) event, comparing those who did and did not take aspirin
  2. and determine if a positive result on one test vs another are associated

I've tried a few ways to arrange the rows and columns, but none of them produce any kind of useful output from PROC FREQ (see spoiler). I only have the totals: I don't have a file with all 19934 lines of data, just the information that 477 of them had a CVD. Is there any way to use SAS for this calculation, short of inputting 40,000 lines of data?

Spoiler
useless.PNGuseless2.PNG
DATA ASPIRIN; 
   INPUT Treat $ Total CVD Divided; 
   DATALINES; 
   1=Aspirin 19934 477 .0239
   2=Placebo 19942 522 .0262
;
run;

PROC FREQ DATA=ASPIRIN;
TABLES Total*CVD /CHISQ RELRISK ;
TITLE 'Relationship between treatment and CVD';
RUN;

DATA TEST; 
   INPUT Q $ Yes; 
*I initially also tried
putting in 4 variables (Q1andQ2, etc)
with 1 value each but this also seemed not to work?
What am I missing?; DATALINES; Q1andQ2 172 NEITHER 15 OnlyQ1 7 OnlyQ2 6 ; run; PROC freq data = test; run;

Thank you!!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You need N*M observations to represent data when have two factors that have N and M possible values.

So you need 2*2=4 observations.

You can get it from your source text.

DATA ASPIRIN; 
   INPUT Treat :$20. Total CVD Divided;
   HAS_CVD=1;
   COUNT=CVD;
   output;
   HAS_CVD=0;
   count = total-cvd;
   output;
   keep treat has_cvd count; 
DATALINES; 
1=Aspirin 19934 477 .0239
2=Placebo 19942 522 .0262
;

proc print;
run;

proc freq ;
  tables treat*has_cvd /chisq;
  weight count;
run;

image.png

The FREQ Procedure

Statistics for Table of Treat by HAS_CVD

Statistic                     DF       Value      Prob
------------------------------------------------------
Chi-Square                     1      2.0606    0.1511
Likelihood Ratio Chi-Square    1      2.0613    0.1511
Continuity Adj. Chi-Square     1      1.9697    0.1605
Mantel-Haenszel Chi-Square     1      2.0606    0.1512
Phi Coefficient                       0.0072
Contingency Coefficient               0.0072
Cramer's V                            0.0072


       Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F)     19457
Left-sided Pr <= F          0.9289
Right-sided Pr >= F         0.0802

Table Probability (P)       0.0091
Two-sided Pr <= P           0.1586

Sample Size = 39876
 

 

View solution in original post

3 REPLIES 3
Reeza
Super User
DATA ASPIRIN; 
   INPUT Treat $ Total CVD Divided; 
   DATALINES; 
   1=Aspirin 19934 477 .0239
   2=Placebo 19942 522 .0262
;
run;

data aspirin_long;
set aspirin;
treatment = input(scan(treat, 1, "="), 8.);
Disease=1;
N=CVD;
output;
Disease=0;
N=Total-CVD;
output;
run;

proc format;
value treat_fmt
1 = 'Aspirin'
2 = 'Placebo'
;
value disease_fmt
1 = 'CVD'
0 = 'Non-CVD';
run;

proc freq data=aspirin_long;
table treatment*disease;
weight N;
format treatment treat_fmt. disease disease_fmt.;
run;

You can use the WEIGHT statement to use aggregate data but you do need to have the data structured a bit differently. 

Hopefully this works for you.

 


@akimme wrote:

Hi everyone, I'm trying to:

  1. run a chi square on proportions of people who did and did not have a cardiovascular (CVD) event, comparing those who did and did not take aspirin
  2. and determine if a positive result on one test vs another are associated

I've tried a few ways to arrange the rows and columns, but none of them produce any kind of useful output from PROC FREQ (see spoiler). I only have the totals: I don't have a file with all 19934 lines of data, just the information that 477 of them had a CVD. Is there any way to use SAS for this calculation, short of inputting 40,000 lines of data?

Spoiler
useless.PNGuseless2.PNG
DATA ASPIRIN; 
   INPUT Treat $ Total CVD Divided; 
   DATALINES; 
   1=Aspirin 19934 477 .0239
   2=Placebo 19942 522 .0262
;
run;

PROC FREQ DATA=ASPIRIN;
TABLES Total*CVD /CHISQ RELRISK ;
TITLE 'Relationship between treatment and CVD';
RUN;

DATA TEST; 
   INPUT Q $ Yes; 
*I initially also tried
putting in 4 variables (Q1andQ2, etc)
with 1 value each but this also seemed not to work?
What am I missing?; DATALINES; Q1andQ2 172 NEITHER 15 OnlyQ1 7 OnlyQ2 6 ; run; PROC freq data = test; run;

Thank you!!


 

Tom
Super User Tom
Super User

You need N*M observations to represent data when have two factors that have N and M possible values.

So you need 2*2=4 observations.

You can get it from your source text.

DATA ASPIRIN; 
   INPUT Treat :$20. Total CVD Divided;
   HAS_CVD=1;
   COUNT=CVD;
   output;
   HAS_CVD=0;
   count = total-cvd;
   output;
   keep treat has_cvd count; 
DATALINES; 
1=Aspirin 19934 477 .0239
2=Placebo 19942 522 .0262
;

proc print;
run;

proc freq ;
  tables treat*has_cvd /chisq;
  weight count;
run;

image.png

The FREQ Procedure

Statistics for Table of Treat by HAS_CVD

Statistic                     DF       Value      Prob
------------------------------------------------------
Chi-Square                     1      2.0606    0.1511
Likelihood Ratio Chi-Square    1      2.0613    0.1511
Continuity Adj. Chi-Square     1      1.9697    0.1605
Mantel-Haenszel Chi-Square     1      2.0606    0.1512
Phi Coefficient                       0.0072
Contingency Coefficient               0.0072
Cramer's V                            0.0072


       Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F)     19457
Left-sided Pr <= F          0.9289
Right-sided Pr >= F         0.0802

Table Probability (P)       0.0091
Two-sided Pr <= P           0.1586

Sample Size = 39876
 

 

akimme
Obsidian | Level 7

Okay, it looks like the COUNT= step was the big thing I was missing. That worked, thank you so much!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 651 views
  • 0 likes
  • 3 in conversation