Hi everyone, I'm trying to:
I've tried a few ways to arrange the rows and columns, but none of them produce any kind of useful output from PROC FREQ (see spoiler). I only have the totals: I don't have a file with all 19934 lines of data, just the information that 477 of them had a CVD. Is there any way to use SAS for this calculation, short of inputting 40,000 lines of data?
DATA ASPIRIN;
INPUT Treat $ Total CVD Divided;
DATALINES;
1=Aspirin 19934 477 .0239
2=Placebo 19942 522 .0262
;
run;
PROC FREQ DATA=ASPIRIN;
TABLES Total*CVD /CHISQ RELRISK ;
TITLE 'Relationship between treatment and CVD';
RUN;
DATA TEST;
INPUT Q $ Yes;
*I initially also tried
putting in 4 variables (Q1andQ2, etc)
with 1 value each but this also seemed not to work?
What am I missing?;
DATALINES;
Q1andQ2 172
NEITHER 15
OnlyQ1 7
OnlyQ2 6
;
run;
PROC freq data = test;
run;
Thank you!!
You need N*M observations to represent data when have two factors that have N and M possible values.
So you need 2*2=4 observations.
You can get it from your source text.
DATA ASPIRIN;
INPUT Treat :$20. Total CVD Divided;
HAS_CVD=1;
COUNT=CVD;
output;
HAS_CVD=0;
count = total-cvd;
output;
keep treat has_cvd count;
DATALINES;
1=Aspirin 19934 477 .0239
2=Placebo 19942 522 .0262
;
proc print;
run;
proc freq ;
tables treat*has_cvd /chisq;
weight count;
run;
The FREQ Procedure Statistics for Table of Treat by HAS_CVD Statistic DF Value Prob ------------------------------------------------------ Chi-Square 1 2.0606 0.1511 Likelihood Ratio Chi-Square 1 2.0613 0.1511 Continuity Adj. Chi-Square 1 1.9697 0.1605 Mantel-Haenszel Chi-Square 1 2.0606 0.1512 Phi Coefficient 0.0072 Contingency Coefficient 0.0072 Cramer's V 0.0072 Fisher's Exact Test ---------------------------------- Cell (1,1) Frequency (F) 19457 Left-sided Pr <= F 0.9289 Right-sided Pr >= F 0.0802 Table Probability (P) 0.0091 Two-sided Pr <= P 0.1586 Sample Size = 39876
DATA ASPIRIN;
INPUT Treat $ Total CVD Divided;
DATALINES;
1=Aspirin 19934 477 .0239
2=Placebo 19942 522 .0262
;
run;
data aspirin_long;
set aspirin;
treatment = input(scan(treat, 1, "="), 8.);
Disease=1;
N=CVD;
output;
Disease=0;
N=Total-CVD;
output;
run;
proc format;
value treat_fmt
1 = 'Aspirin'
2 = 'Placebo'
;
value disease_fmt
1 = 'CVD'
0 = 'Non-CVD';
run;
proc freq data=aspirin_long;
table treatment*disease;
weight N;
format treatment treat_fmt. disease disease_fmt.;
run;
You can use the WEIGHT statement to use aggregate data but you do need to have the data structured a bit differently.
Hopefully this works for you.
@akimme wrote:
Hi everyone, I'm trying to:
- run a chi square on proportions of people who did and did not have a cardiovascular (CVD) event, comparing those who did and did not take aspirin
- and determine if a positive result on one test vs another are associated
I've tried a few ways to arrange the rows and columns, but none of them produce any kind of useful output from PROC FREQ (see spoiler). I only have the totals: I don't have a file with all 19934 lines of data, just the information that 477 of them had a CVD. Is there any way to use SAS for this calculation, short of inputting 40,000 lines of data?
DATA ASPIRIN; INPUT Treat $ Total CVD Divided; DATALINES; 1=Aspirin 19934 477 .0239 2=Placebo 19942 522 .0262 ; run; PROC FREQ DATA=ASPIRIN; TABLES Total*CVD /CHISQ RELRISK ; TITLE 'Relationship between treatment and CVD'; RUN; DATA TEST; INPUT Q $ Yes;
*I initially also tried
putting in 4 variables (Q1andQ2, etc)
with 1 value each but this also seemed not to work?
What am I missing?; DATALINES; Q1andQ2 172 NEITHER 15 OnlyQ1 7 OnlyQ2 6 ; run; PROC freq data = test; run;Thank you!!
You need N*M observations to represent data when have two factors that have N and M possible values.
So you need 2*2=4 observations.
You can get it from your source text.
DATA ASPIRIN;
INPUT Treat :$20. Total CVD Divided;
HAS_CVD=1;
COUNT=CVD;
output;
HAS_CVD=0;
count = total-cvd;
output;
keep treat has_cvd count;
DATALINES;
1=Aspirin 19934 477 .0239
2=Placebo 19942 522 .0262
;
proc print;
run;
proc freq ;
tables treat*has_cvd /chisq;
weight count;
run;
The FREQ Procedure Statistics for Table of Treat by HAS_CVD Statistic DF Value Prob ------------------------------------------------------ Chi-Square 1 2.0606 0.1511 Likelihood Ratio Chi-Square 1 2.0613 0.1511 Continuity Adj. Chi-Square 1 1.9697 0.1605 Mantel-Haenszel Chi-Square 1 2.0606 0.1512 Phi Coefficient 0.0072 Contingency Coefficient 0.0072 Cramer's V 0.0072 Fisher's Exact Test ---------------------------------- Cell (1,1) Frequency (F) 19457 Left-sided Pr <= F 0.9289 Right-sided Pr >= F 0.0802 Table Probability (P) 0.0091 Two-sided Pr <= P 0.1586 Sample Size = 39876
Okay, it looks like the COUNT= step was the big thing I was missing. That worked, thank you so much!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.