Hi everyone, I'm trying to:
I've tried a few ways to arrange the rows and columns, but none of them produce any kind of useful output from PROC FREQ (see spoiler). I only have the totals: I don't have a file with all 19934 lines of data, just the information that 477 of them had a CVD. Is there any way to use SAS for this calculation, short of inputting 40,000 lines of data?
DATA ASPIRIN;
INPUT Treat $ Total CVD Divided;
DATALINES;
1=Aspirin 19934 477 .0239
2=Placebo 19942 522 .0262
;
run;
PROC FREQ DATA=ASPIRIN;
TABLES Total*CVD /CHISQ RELRISK ;
TITLE 'Relationship between treatment and CVD';
RUN;
DATA TEST;
INPUT Q $ Yes;
*I initially also tried
putting in 4 variables (Q1andQ2, etc)
with 1 value each but this also seemed not to work?
What am I missing?;
DATALINES;
Q1andQ2 172
NEITHER 15
OnlyQ1 7
OnlyQ2 6
;
run;
PROC freq data = test;
run;Thank you!!
You need N*M observations to represent data when have two factors that have N and M possible values.
So you need 2*2=4 observations.
You can get it from your source text.
DATA ASPIRIN;
INPUT Treat :$20. Total CVD Divided;
HAS_CVD=1;
COUNT=CVD;
output;
HAS_CVD=0;
count = total-cvd;
output;
keep treat has_cvd count;
DATALINES;
1=Aspirin 19934 477 .0239
2=Placebo 19942 522 .0262
;
proc print;
run;
proc freq ;
tables treat*has_cvd /chisq;
weight count;
run;
The FREQ Procedure
Statistics for Table of Treat by HAS_CVD
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 2.0606 0.1511
Likelihood Ratio Chi-Square 1 2.0613 0.1511
Continuity Adj. Chi-Square 1 1.9697 0.1605
Mantel-Haenszel Chi-Square 1 2.0606 0.1512
Phi Coefficient 0.0072
Contingency Coefficient 0.0072
Cramer's V 0.0072
Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F) 19457
Left-sided Pr <= F 0.9289
Right-sided Pr >= F 0.0802
Table Probability (P) 0.0091
Two-sided Pr <= P 0.1586
Sample Size = 39876
DATA ASPIRIN;
INPUT Treat $ Total CVD Divided;
DATALINES;
1=Aspirin 19934 477 .0239
2=Placebo 19942 522 .0262
;
run;
data aspirin_long;
set aspirin;
treatment = input(scan(treat, 1, "="), 8.);
Disease=1;
N=CVD;
output;
Disease=0;
N=Total-CVD;
output;
run;
proc format;
value treat_fmt
1 = 'Aspirin'
2 = 'Placebo'
;
value disease_fmt
1 = 'CVD'
0 = 'Non-CVD';
run;
proc freq data=aspirin_long;
table treatment*disease;
weight N;
format treatment treat_fmt. disease disease_fmt.;
run;
You can use the WEIGHT statement to use aggregate data but you do need to have the data structured a bit differently.
Hopefully this works for you.
@akimme wrote:
Hi everyone, I'm trying to:
- run a chi square on proportions of people who did and did not have a cardiovascular (CVD) event, comparing those who did and did not take aspirin
- and determine if a positive result on one test vs another are associated
I've tried a few ways to arrange the rows and columns, but none of them produce any kind of useful output from PROC FREQ (see spoiler). I only have the totals: I don't have a file with all 19934 lines of data, just the information that 477 of them had a CVD. Is there any way to use SAS for this calculation, short of inputting 40,000 lines of data?
DATA ASPIRIN; INPUT Treat $ Total CVD Divided; DATALINES; 1=Aspirin 19934 477 .0239 2=Placebo 19942 522 .0262 ; run; PROC FREQ DATA=ASPIRIN; TABLES Total*CVD /CHISQ RELRISK ; TITLE 'Relationship between treatment and CVD'; RUN; DATA TEST; INPUT Q $ Yes;
*I initially also tried
putting in 4 variables (Q1andQ2, etc)
with 1 value each but this also seemed not to work?
What am I missing?; DATALINES; Q1andQ2 172 NEITHER 15 OnlyQ1 7 OnlyQ2 6 ; run; PROC freq data = test; run;Thank you!!
You need N*M observations to represent data when have two factors that have N and M possible values.
So you need 2*2=4 observations.
You can get it from your source text.
DATA ASPIRIN;
INPUT Treat :$20. Total CVD Divided;
HAS_CVD=1;
COUNT=CVD;
output;
HAS_CVD=0;
count = total-cvd;
output;
keep treat has_cvd count;
DATALINES;
1=Aspirin 19934 477 .0239
2=Placebo 19942 522 .0262
;
proc print;
run;
proc freq ;
tables treat*has_cvd /chisq;
weight count;
run;
The FREQ Procedure
Statistics for Table of Treat by HAS_CVD
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 2.0606 0.1511
Likelihood Ratio Chi-Square 1 2.0613 0.1511
Continuity Adj. Chi-Square 1 1.9697 0.1605
Mantel-Haenszel Chi-Square 1 2.0606 0.1512
Phi Coefficient 0.0072
Contingency Coefficient 0.0072
Cramer's V 0.0072
Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F) 19457
Left-sided Pr <= F 0.9289
Right-sided Pr >= F 0.0802
Table Probability (P) 0.0091
Two-sided Pr <= P 0.1586
Sample Size = 39876
Okay, it looks like the COUNT= step was the big thing I was missing. That worked, thank you so much!
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.