Hi everyone, I'm trying to:
I've tried a few ways to arrange the rows and columns, but none of them produce any kind of useful output from PROC FREQ (see spoiler). I only have the totals: I don't have a file with all 19934 lines of data, just the information that 477 of them had a CVD. Is there any way to use SAS for this calculation, short of inputting 40,000 lines of data?
DATA ASPIRIN;
INPUT Treat $ Total CVD Divided;
DATALINES;
1=Aspirin 19934 477 .0239
2=Placebo 19942 522 .0262
;
run;
PROC FREQ DATA=ASPIRIN;
TABLES Total*CVD /CHISQ RELRISK ;
TITLE 'Relationship between treatment and CVD';
RUN;
DATA TEST;
INPUT Q $ Yes;
*I initially also tried
putting in 4 variables (Q1andQ2, etc)
with 1 value each but this also seemed not to work?
What am I missing?;
DATALINES;
Q1andQ2 172
NEITHER 15
OnlyQ1 7
OnlyQ2 6
;
run;
PROC freq data = test;
run;
Thank you!!
You need N*M observations to represent data when have two factors that have N and M possible values.
So you need 2*2=4 observations.
You can get it from your source text.
DATA ASPIRIN;
INPUT Treat :$20. Total CVD Divided;
HAS_CVD=1;
COUNT=CVD;
output;
HAS_CVD=0;
count = total-cvd;
output;
keep treat has_cvd count;
DATALINES;
1=Aspirin 19934 477 .0239
2=Placebo 19942 522 .0262
;
proc print;
run;
proc freq ;
tables treat*has_cvd /chisq;
weight count;
run;
The FREQ Procedure Statistics for Table of Treat by HAS_CVD Statistic DF Value Prob ------------------------------------------------------ Chi-Square 1 2.0606 0.1511 Likelihood Ratio Chi-Square 1 2.0613 0.1511 Continuity Adj. Chi-Square 1 1.9697 0.1605 Mantel-Haenszel Chi-Square 1 2.0606 0.1512 Phi Coefficient 0.0072 Contingency Coefficient 0.0072 Cramer's V 0.0072 Fisher's Exact Test ---------------------------------- Cell (1,1) Frequency (F) 19457 Left-sided Pr <= F 0.9289 Right-sided Pr >= F 0.0802 Table Probability (P) 0.0091 Two-sided Pr <= P 0.1586 Sample Size = 39876
DATA ASPIRIN;
INPUT Treat $ Total CVD Divided;
DATALINES;
1=Aspirin 19934 477 .0239
2=Placebo 19942 522 .0262
;
run;
data aspirin_long;
set aspirin;
treatment = input(scan(treat, 1, "="), 8.);
Disease=1;
N=CVD;
output;
Disease=0;
N=Total-CVD;
output;
run;
proc format;
value treat_fmt
1 = 'Aspirin'
2 = 'Placebo'
;
value disease_fmt
1 = 'CVD'
0 = 'Non-CVD';
run;
proc freq data=aspirin_long;
table treatment*disease;
weight N;
format treatment treat_fmt. disease disease_fmt.;
run;
You can use the WEIGHT statement to use aggregate data but you do need to have the data structured a bit differently.
Hopefully this works for you.
@akimme wrote:
Hi everyone, I'm trying to:
- run a chi square on proportions of people who did and did not have a cardiovascular (CVD) event, comparing those who did and did not take aspirin
- and determine if a positive result on one test vs another are associated
I've tried a few ways to arrange the rows and columns, but none of them produce any kind of useful output from PROC FREQ (see spoiler). I only have the totals: I don't have a file with all 19934 lines of data, just the information that 477 of them had a CVD. Is there any way to use SAS for this calculation, short of inputting 40,000 lines of data?
DATA ASPIRIN; INPUT Treat $ Total CVD Divided; DATALINES; 1=Aspirin 19934 477 .0239 2=Placebo 19942 522 .0262 ; run; PROC FREQ DATA=ASPIRIN; TABLES Total*CVD /CHISQ RELRISK ; TITLE 'Relationship between treatment and CVD'; RUN; DATA TEST; INPUT Q $ Yes;
*I initially also tried
putting in 4 variables (Q1andQ2, etc)
with 1 value each but this also seemed not to work?
What am I missing?; DATALINES; Q1andQ2 172 NEITHER 15 OnlyQ1 7 OnlyQ2 6 ; run; PROC freq data = test; run;Thank you!!
You need N*M observations to represent data when have two factors that have N and M possible values.
So you need 2*2=4 observations.
You can get it from your source text.
DATA ASPIRIN;
INPUT Treat :$20. Total CVD Divided;
HAS_CVD=1;
COUNT=CVD;
output;
HAS_CVD=0;
count = total-cvd;
output;
keep treat has_cvd count;
DATALINES;
1=Aspirin 19934 477 .0239
2=Placebo 19942 522 .0262
;
proc print;
run;
proc freq ;
tables treat*has_cvd /chisq;
weight count;
run;
The FREQ Procedure Statistics for Table of Treat by HAS_CVD Statistic DF Value Prob ------------------------------------------------------ Chi-Square 1 2.0606 0.1511 Likelihood Ratio Chi-Square 1 2.0613 0.1511 Continuity Adj. Chi-Square 1 1.9697 0.1605 Mantel-Haenszel Chi-Square 1 2.0606 0.1512 Phi Coefficient 0.0072 Contingency Coefficient 0.0072 Cramer's V 0.0072 Fisher's Exact Test ---------------------------------- Cell (1,1) Frequency (F) 19457 Left-sided Pr <= F 0.9289 Right-sided Pr >= F 0.0802 Table Probability (P) 0.0091 Two-sided Pr <= P 0.1586 Sample Size = 39876
Okay, it looks like the COUNT= step was the big thing I was missing. That worked, thank you so much!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.