Solved: How to arrange data in rows and columns when inputting when you only h...

akimme · Posted 11-19-2021 04:10 PM

Hi everyone, I'm trying to:

run a chi square on proportions of people who did and did not have a cardiovascular (CVD) event, comparing those who did and did not take aspirin
and determine if a positive result on one test vs another are associated

I've tried a few ways to arrange the rows and columns, but none of them produce any kind of useful output from PROC FREQ (see spoiler). I only have the totals: I don't have a file with all 19934 lines of data, just the information that 477 of them had a CVD. Is there any way to use SAS for this calculation, short of inputting 40,000 lines of data?

Spoiler

DATA ASPIRIN; 
   INPUT Treat $ Total CVD Divided; 
   DATALINES; 
   1=Aspirin 19934 477 .0239
   2=Placebo 19942 522 .0262
;
run;

PROC FREQ DATA=ASPIRIN;
TABLES Total*CVD /CHISQ RELRISK ;
TITLE 'Relationship between treatment and CVD';
RUN;

DATA TEST; 
   INPUT Q $ Yes; 
*I initially also tried 
putting in 4 variables (Q1andQ2, etc) 
with 1 value each but this also seemed not to work? 
What am I missing?;
   DATALINES; 
Q1andQ2 172 
NEITHER 15  
OnlyQ1 7    
OnlyQ2 6   
;
run;

PROC freq data = test;
run;

Thank you!!

Tom · Posted 11-19-2021 11:09 PM

You need N*M observations to represent data when have two factors that have N and M possible values.

So you need 2*2=4 observations.

You can get it from your source text.

DATA ASPIRIN; 
   INPUT Treat :$20. Total CVD Divided;
   HAS_CVD=1;
   COUNT=CVD;
   output;
   HAS_CVD=0;
   count = total-cvd;
   output;
   keep treat has_cvd count; 
DATALINES; 
1=Aspirin 19934 477 .0239
2=Placebo 19942 522 .0262
;

proc print;
run;

proc freq ;
  tables treat*has_cvd /chisq;
  weight count;
run;

The FREQ Procedure

Statistics for Table of Treat by HAS_CVD

Statistic                     DF       Value      Prob
------------------------------------------------------
Chi-Square                     1      2.0606    0.1511
Likelihood Ratio Chi-Square    1      2.0613    0.1511
Continuity Adj. Chi-Square     1      1.9697    0.1605
Mantel-Haenszel Chi-Square     1      2.0606    0.1512
Phi Coefficient                       0.0072
Contingency Coefficient               0.0072
Cramer's V                            0.0072


       Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F)     19457
Left-sided Pr <= F          0.9289
Right-sided Pr >= F         0.0802

Table Probability (P)       0.0091
Two-sided Pr <= P           0.1586

Sample Size = 39876

View solution in original post

Reeza · Posted 11-19-2021 04:34 PM

DATA ASPIRIN; 
   INPUT Treat $ Total CVD Divided; 
   DATALINES; 
   1=Aspirin 19934 477 .0239
   2=Placebo 19942 522 .0262
;
run;

data aspirin_long;
set aspirin;
treatment = input(scan(treat, 1, "="), 8.);
Disease=1;
N=CVD;
output;
Disease=0;
N=Total-CVD;
output;
run;

proc format;
value treat_fmt
1 = 'Aspirin'
2 = 'Placebo'
;
value disease_fmt
1 = 'CVD'
0 = 'Non-CVD';
run;

proc freq data=aspirin_long;
table treatment*disease;
weight N;
format treatment treat_fmt. disease disease_fmt.;
run;

You can use the WEIGHT statement to use aggregate data but you do need to have the data structured a bit differently.

Hopefully this works for you.

@akimme wrote:

Hi everyone, I'm trying to:

run a chi square on proportions of people who did and did not have a cardiovascular (CVD) event, comparing those who did and did not take aspirin

and determine if a positive result on one test vs another are associated

I've tried a few ways to arrange the rows and columns, but none of them produce any kind of useful output from PROC FREQ (see spoiler). I only have the totals: I don't have a file with all 19934 lines of data, just the information that 477 of them had a CVD. Is there any way to use SAS for this calculation, short of inputting 40,000 lines of data?

Spoiler
DATA ASPIRIN; 
   INPUT Treat $ Total CVD Divided; 
   DATALINES; 
   1=Aspirin 19934 477 .0239
   2=Placebo 19942 522 .0262
;
run;

PROC FREQ DATA=ASPIRIN;
TABLES Total*CVD /CHISQ RELRISK ;
TITLE 'Relationship between treatment and CVD';
RUN;

DATA TEST; 
   INPUT Q $ Yes; 
*I initially also tried 
putting in 4 variables (Q1andQ2, etc) 
with 1 value each but this also seemed not to work? 
What am I missing?;
   DATALINES; 
Q1andQ2 172 
NEITHER 15  
OnlyQ1 7    
OnlyQ2 6   
;
run;

PROC freq data = test;
run;
Thank you!!

Tom · Posted 11-19-2021 11:09 PM

You need N*M observations to represent data when have two factors that have N and M possible values.

So you need 2*2=4 observations.

You can get it from your source text.

DATA ASPIRIN; 
   INPUT Treat :$20. Total CVD Divided;
   HAS_CVD=1;
   COUNT=CVD;
   output;
   HAS_CVD=0;
   count = total-cvd;
   output;
   keep treat has_cvd count; 
DATALINES; 
1=Aspirin 19934 477 .0239
2=Placebo 19942 522 .0262
;

proc print;
run;

proc freq ;
  tables treat*has_cvd /chisq;
  weight count;
run;

The FREQ Procedure

Statistics for Table of Treat by HAS_CVD

Statistic                     DF       Value      Prob
------------------------------------------------------
Chi-Square                     1      2.0606    0.1511
Likelihood Ratio Chi-Square    1      2.0613    0.1511
Continuity Adj. Chi-Square     1      1.9697    0.1605
Mantel-Haenszel Chi-Square     1      2.0606    0.1512
Phi Coefficient                       0.0072
Contingency Coefficient               0.0072
Cramer's V                            0.0072


       Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F)     19457
Left-sided Pr <= F          0.9289
Right-sided Pr >= F         0.0802

Table Probability (P)       0.0091
Two-sided Pr <= P           0.1586

Sample Size = 39876

akimme · Posted 11-20-2021 11:25 PM

Okay, it looks like the COUNT= step was the big thing I was missing. That worked, thank you so much!

How to arrange data in rows and columns when inputting when you only have totals?

Re: How to arrange data in rows and columns when inputting when you only have totals?

Re: How to arrange data in rows and columns when inputting when you only have totals?

Re: How to arrange data in rows and columns when inputting when you only have totals?

Re: How to arrange data in rows and columns when inputting when you only have totals?

How to arrange data in rows and columns when inputting when you only have totals?

Re: How to arrange data in rows and columns when inputting when you only have totals?

Re: How to arrange data in rows and columns when inputting when you only have totals?

Re: How to arrange data in rows and columns when inputting when you only have totals?

Re: How to arrange data in rows and columns when inputting when you only have totals?

SAS Innovate 2025: Call for Content