chi-squared test for every value of a categorical variable with a grou...

sms1891 · Posted 01-05-2022 04:43 PM

Hi all,

I have a dataset with 60,000 pts with cause of death listed as an ICD 10 code for each patient (variable name: cause_death). The grouping variable is AP ( 0= No/ 1= Yes). So I want to see if there are any significant differences in each of those ICD 10 codes between the two groups. These ICD 10 codes are captured in one variable cause_death and there are about 300 different ICD 10 codes listed within the cause_death variable. I would appreciate if anyone can help me with the SAS code for chi-squared test with p-va;ue comparison for each ICD 10 code within the cause_death variable by grouping variable AP.

Thank you so much in advance!
SM

Reeza · Posted 01-05-2022 04:51 PM

Please show an example of what your input data looks like and what you want as output.
I'm not sure if you're looking for a two way or one way chi square? Guessing one way the way the question is formulated.
Is the ODS results ok?

Rough example of how that could look, I can add more details once you respond with the requested information.

proc sort data=sashelp.class out=class;
by sex age;
run;

ods output onewaychisq=want;
proc freq data=class;
by sex;
table age /chisq;
run;

@sms1891 wrote:

Hi all,

I have a dataset with 60,000 pts with cause of death listed as an ICD 10 code for each patient (variable name: cause_death). The grouping variable is AP ( 0= No/ 1= Yes). So I want to see if there are any significant differences in each of those ICD 10 codes between the two groups. These ICD 10 codes are captured in one variable cause_death and there are about 300 different ICD 10 codes listed within the cause_death variable. I would appreciate if anyone can help me with the SAS code for chi-squared test with p-va;ue comparison for each ICD 10 code within the cause_death variable by grouping variable AP.

Thank you so much in advance!
SM

sms1891 · Posted 01-05-2022 06:03 PM

Hi reeza,

Thank you for the response. Here is what my data looks like (below).Attaching the summary table for the cause_death by AP variable as a pic.

ID Cause_death AP

1 I251 0

2 C349 1

3 C349 1

4 J449 0

6 J449 1

7 J449 1

8 F03 0

9 I250 0

10 I500 1

11 I250 1

12 I250 1

PaigeMiller · Posted 01-05-2022 06:07 PM

Exactly what Chi-squared test do you want? Please be specific and detailed and reference the data.

If your data only has 0s and 1s within a cause_death, you can only test to see if the proportion of 0s (or the proportion of 1s) is equal to some specific value which you can choose. This is not really a chi-squared test at all, it is a binomial test or if the sample size is large enough, you can do a z-test approximation.

If you want to test the ENTIRE table that you show across all cause_death values, you can of course do that with a chi-squared test, but then you are not testing "every value of a categorical variable", you are doing a global test.

--
Paige Miller

sms1891 · Posted 01-05-2022 06:08 PM

I want to compare the frequencies of each of the ICD 10 codes, for example, I251 8.6% vs 6.67% between AP 0/1, to check if there is a significant difference with a p-value. Similarly, I want to do it for all ICD 10 codes listed, C349, J449, I219, etc.

Thank you!

PaigeMiller · Posted 01-05-2022 06:18 PM

@sms1891 wrote:

I want to compare the frequencies of each of the ICD 10 codes, for example, I251 8.6% vs 6.67% between AP 0/1, to check if there is a significant difference with a p-value. Similarly, I want to do it for all ICD 10 codes listed, C349, J449, I219, etc.

Unless I am misunderstanding, this is not a Chi-squared test. You could test to see if the I251 percent of 0s and 1s is equal to a specific value, such as 10% (or any other number you want), but you have to specify the number. It is a binomial test or if the sample size is large, you can approximate it with a z-test.

--
Paige Miller

Reeza · Posted 01-05-2022 11:05 PM

A one way chi square test is equivalent to the binomial test with the proportion being equal to 0.5/50% in this case?

My stats are getting rustier by the day but I do think it's somewhat valid, though possibly not ideal.

You would definitely need to correct for multiple testing though.

@PaigeMiller wrote:

@sms1891 wrote:

I want to compare the frequencies of each of the ICD 10 codes, for example, I251 8.6% vs 6.67% between AP 0/1, to check if there is a significant difference with a p-value. Similarly, I want to do it for all ICD 10 codes listed, C349, J449, I219, etc.

Unless I am misunderstanding, this is not a Chi-squared test. You could test to see if the I251 percent of 0s and 1s is equal to a specific value, such as 10% (or any other number you want), but you have to specify the number. It is a binomial test or if the sample size is large, you can approximate it with a z-test.

sms1891 · Posted 01-05-2022 11:07 PM

PaigeMiller,

I have a total of 55489 patients who died. All these patients died of some cause (cause_death) which is captured as an ICD 10 code. My exposure variable is AP (Yes= 1, No = 0). Out of 55489, 15763 had AP (AP=1), 39726 did not have AP (AP =0). I want to compare each and every ICD 10 code % with the grouping variable AP.

If 8.6% of No-AP (3416/39726) died of I251 and 6.67% of AP (1051/15763) died of I251, then there are remaining 91.4% of No-AP (n=36310) and 93.33% of AP patients (n=14712) who died of other reason other than I251. So this would technically become a 2x2 table and I want to do a chi-squared test to compare if 8.6% is statistically significant compared to the 6.67%.

Similarly, for patients who died of C349, there would be a unique 2x2 table based off the grouping variable. 2809 of 39726 No-AP (7.07%) died of C349 and 978 of 15763 AP (6.20%) died of C349. So this would again become a 2x2 table and I want to do a chi-squared test to compare if 7.07% is statistically significant compared to 6.20%.

I have more than 300 causes of death for all patients who died (n=55489). I do not want to manually do a chi-squared test for each and every cause of death for comparison. I would love to get some help, if there is a better way to handle this using SAS coding. I hope you understood my question now. Please do let me know if you need any further clarification. I have posted my data structure example in my reply to Reeza post along with a picture of the summary table by AP status. I am also attaching the 2x2 table that I explained for each code above as a picture. Obviously this is a manual way of doing a chi-squared test by feeding the 2x2 table in a data step and running the chi-squared test in a proc freq step.

sms1891 · Posted 01-05-2022 11:29 PM

I found this post from another post, similar to what I am looking for, but it is not working for my data set.

https://communities.sas.com/t5/SAS-Programming/How-to-make-a-Chi-square-independence-test-for-each-i...

%let dsn=sashelp.cars;
%let var1=origin;
%let var2=type;

data want(keep=_byval1 _byval2 &var1 &var2 _count) / view=want;
if _n_=1 then do;
  dcl hash h();
  h.definekey("&var1", "&var2");
  h.definedata("&var1", "&var2", '_count');
  h.definedone();
  dcl hiter hi('h');

  dcl hash k(dataset:"&dsn", ordered:'a');
  k.definekey("&var1");
  k.definedone();
  dcl hiter hi1('k');
  dcl hiter hi2('k');
end;
set &dsn end=last;
if h.find()=0 then _count+1;
else _count=1;
h.replace();
if last;
do while(hi1.next()=0);
  _byval1=&var1;
  hi2.setcur();
  do while(hi2.next()=0);
    _byval2=&var1;
    do while(hi.next()=0);
      if &var1=_byval1 | &var1=_byval2 then output;
    end;
  end;
end;
run;

proc freq data=want;
by _byval1 _byval2;
weight _count;
tables &var1*&var2 / chisq /* missing */;
run;

PaigeMiller · Posted 01-06-2022 06:49 AM

@sms1891 Thanks, that is much clearer now what you want. I don't have the time to work on this right now, maybe later. But you still need to provide a portion of your data as working SAS data step code (instructions). Other formats are not acceptable.

--
Paige Miller

Reeza · Posted 01-05-2022 10:56 PM

1. I can't program off a picture and I'm not typing out your data. Your sample below is too small to work with.

2. You didn't show what you wanted as output.

Did you run the example I posted? How close does it get you to what you want?

@sms1891 wrote:

Hi reeza,

Thank you for the response. Here is what my data looks like (below).Attaching the summary table for the cause_death by AP variable as a pic.

ID Cause_death AP

1 I251 0

2 C349 1

3 C349 1

4 J449 0

6 J449 1

7 J449 1

8 F03 0

9 I250 0

10 I500 1

11 I250 1

12 I250 1

chi-squared test for every value of a categorical variable with a grouping variable

Re: chi-squared test for every value of a categorical variable with a grouping variable

Re: chi-squared test for every value of a categorical variable with a grouping variable

Re: chi-squared test for every value of a categorical variable with a grouping variable

Re: chi-squared test for every value of a categorical variable with a grouping variable

Re: chi-squared test for every value of a categorical variable with a grouping variable

Re: chi-squared test for every value of a categorical variable with a grouping variable

Re: chi-squared test for every value of a categorical variable with a grouping variable

Re: chi-squared test for every value of a categorical variable with a grouping variable

Re: chi-squared test for every value of a categorical variable with a grouping variable

Re: chi-squared test for every value of a categorical variable with a grouping variable

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away