New SAS User

rebeccachau · Posted 03-25-2019 03:41 AM

I just want to perform a simple chi-square test between two categorical variables, gender and depression. However, I am having trouble formatting my raw data so that I can conduct my analysis. I believe I need to have a COUNT variable included in the data set.

My data is structured like this:

ID DEPRESSION GENDER

1234 1 0

2345 0 1

4567 1 1

Can anyone point me in the right direction? I have tried reading the documentation but don't understand the formatting step.

Thanks

Kurt_Bremser · Posted 03-25-2019 03:45 AM

How about

proc sql;
create table want as
select
  gender,
  count(*) as count,
  sum(depression as depression
from have
group by gender;
quit;

?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

FreelanceReinh · Posted 03-25-2019 06:41 AM

@rebeccachau wrote:

I just want to perform a simple chi-square test between two categorical variables, gender and depression. (...) I believe I need to have a COUNT variable included in the data set.

No, you don't need a COUNT variable. Assuming that each subject corresponds to one observation in your input dataset, you can use this dataset as it is. It's only an optional alternative to start with an aggregated input dataset (if you had one already). See the example below and note that the results (both frequency tables and statistical tests) are identical.

/* Create test data for demonstration */

data have;
call streaminit(27182818);
do id=1234 to 1500;
  depression=rand('bern',0.1);
  gender=rand('bern',0.5);
  output;
end;
run; /* 267 obs. */

/* Perform chi-square test */

proc freq data=have;
tables depression*gender / chisq;
run;


/* Alternative: Use an aggregated analysis dataset */

/* Create the aggregated analysis dataset */

proc freq data=have noprint;
tables depression*gender / out=aggreg;
run; /* 4 obs. (including a variable COUNT) */

/* Specify the variable containing the counts in the WEIGHT statement */

proc freq data=aggreg;
tables depression*gender / chisq;
weight count;
run;

rebeccachau · Posted 03-25-2019 09:28 PM

Thank you. I seem to be able to get this code to work. However, the output tells me that frequency missing is 1048281 (sample size=145) and there is a WARNING: 100% of the data are missing. (Output attached.) What does this mean??

FreelanceReinh · Posted 03-26-2019 05:02 AM

@rebeccachau wrote:

However, the output tells me that frequency missing is 1048281 (sample size=145) and there is a WARNING: 100% of the data are missing. (Output attached.) What does this mean??

It means that 1048283 of the 1048428 observations (=99.986%, SAS rounded this up to "100%") in dataset WORK.IMPORT have missing values for variable GDS or variable GENDER or both. If this is unexpected, it indicates a serious issue with your data, most likely resulting from an inappropriate raw data import (perhaps with PROC IMPORT). This must be investigated thoroughly and all data issues resolved before any reliable statistical analysis can be done.

The first step would be to identify the analysis variable(s) with almost all values missing: Run PROC FREQ steps with only one variable in the TABLES statement (e.g. tables GDS;). Then review the step which created WORK.IMPORT (with a focus on the affected variables) and compare some raw data records (from the original Excel/text/database/... file) to the corresponding observations in WORK.IMPORT so as to make sure that they have been imported correctly.

If your raw data look correct and you don't see what's wrong with your data import, don't hesitate to open a new thread and describe the issue. Many people in this forum will be able to help you with that.

rebeccachau · Posted 03-26-2019 08:41 PM

Thanks. I investigated my Excel file further and there was a problem there. Now the output correctly reports only two items as missing.

New SAS User

Syntax for format COUNT for PROC FREQ (chi-square)

Re: Syntax for format COUNT for PROC FREQ (chi-square)

Re: Syntax for format COUNT for PROC FREQ (chi-square)

Re: Syntax for format COUNT for PROC FREQ (chi-square)

Re: Syntax for format COUNT for PROC FREQ (chi-square)

Re: Syntax for format COUNT for PROC FREQ (chi-square)

Proc format

Errors Running Chi square

Proc Freq Formatting

proc freq output to data set

chi-square test / compute variable worth

Follow Us

What is...

New SAS User

Our biggest data and AI event of the year.

Follow Us

What is...