BookmarkSubscribeRSS Feed
tmm
Fluorite | Level 6 tmm
Fluorite | Level 6

Everything I have seen on trying to do chisq in SAS looks like this

who          scd    sdesc    admission      count

1111          1        major       noreadm       200

1111          1        major       yesredm           2

1111          2      extreme     noreadm        100

1111          2      extreme     yesreadm        15

national      1       major      noreadm         500

national      1       major      yesreadm         10

national      2      extreme     noreadm       250

national      2      extreme    yesreadm        25

2222          1       major       noreadm        300

2222          1       major       yesreadm          8

2222          2      extreme    noreadm         150

2222          2      extreme    yesreadm         10

I have 87k rows all together and after every number in the who column there is another national series with the same counts.

Then code is run and you get separate chi data based on the above.

Now I have 2 tables. 1 that has multiple rows based on the who number which are TIN numbers by the way. That is in my CHI table. Then I have a CHI3 table that took the national and pieced it out and then I summed it to get a table that is named CHI3. CHI has 87k rows with all the TINS that apply. CHI3 is just 4 rows total and I appended it to the CHI tables but now it does not look like the other chi above where the national appears after every list of TINS. I tried to run the CHIsq code off what I appended but it does not work. Is there just a way I can run the chisq by doing a join to the 2 tables? I hope I am not talking in circles and confusing everyone. I can offer more info if need be.

19 REPLIES 19
art297
Opal | Level 21

I, for one, am totally confused.  I don't know what code you are running, on which data, or what results you expect to achieve.

PGStats
Opal | Level 21

What are you trying to test with the ChiSq?

PG
tmm
Fluorite | Level 6 tmm
Fluorite | Level 6

Let me try it this way. I have a table called CHI. The columns are who, svcd, svdesc, admission, count. The data in the who is a taxid and each taxid is logged 8 times because the svcd is 1-4. The svdesc is n/a, minor, major, extreme. the admission is no or yes and the count is the count of no or yes

The above table has a total of 21349 taxids represented.

Then I have table CHI2 which is the national representation of the above table. That is simply taking all the data and doing a sum of the count and then groupby the other stuff. The columns are the same however it is laid out like this:

who                    svcd                      svdesc                       admission                                         count

national                1                           n/a                             no                                                  2990

national                1                           n/a                             yes                                                   5

national                2                       minor                               no                                                 31645

national               2                        minor                              yes                                                  52

national               3                       major                               no                                                   41864

national              3                        major                              yes                                                      154

national              4                      extreme                             no                                                    7589

national              4                      extreme                            yes                                                    19

I need to do a CHIsq off this data and all examples I see the first 8 rows of the data is the taxid then followed by the above 8 rows. then next 8 rows are next taxid followed again by the above 8 rows and so on. Then the chisq code is run to get chi results for each tin and each svcd and admission.

The problem is this means I would need to export both the CHI and CHI2 tables and manually add this above national data after every tax id series. That seems like far too much work. Is there a way maybe I can do an inner join on the 2 tables and tell it with code to do chisq for me?

Reeza
Super User

What would your hypothesis be?

Are you looking for each who to match the national or are you testing for differences between the who's?

What variables are you to test for differences between?

tmm
Fluorite | Level 6 tmm
Fluorite | Level 6

Say the taxid with svcd is 2 which is minor and they have a count of 25 no admissions and then 2 yes admissions. The chisq would be of the national with the 2 as the svcd and the minor and as above national for that is 31645 no admissions and 52 yes admissions. So if I remember my chisq correctly 25/31645 and 2/52 shows the ratio. And using the chisq for cochran mantel with those no and yes counts i would get this as the result:

case control          value = 6205.0000     95% confidence limit = 290.6214   

col1 risk               value = 5651.0714     95% confidence limit = 277.5314

cohort                  value = 0.9259           95% confidence limit = 0.8322

odds ratio is 0.0007900142 for no with svcd 2 and 0.0384615385 for yes with svcd 2

I manually did this based on my first group of TINS and the national data and manually did an input of the data and then ran the chisq and ratio code. But I am not about to manually input the national data and insert it after every 8 rows in the CHI main table that houses all the taxids. That would take forever. So all I am hoping for is to some how inner join the 2 tables and then run my chisq and ratio code and it will know that the first 8 rows of Taxid data compare to the national data and give me the results. Now take the next 8 rows of taxid data and do the same until all taxids are calculated. Otherwise the only thing I can see is exporting my 2 tables to Excel and doing the chisq and ratios in Excel but that is not what the boss wants. Wants it done in SAS but I do not know how unless I export both files to Excel and manually add the national data after every 8 taxid rows.

Reeza
Super User

Shouldn't your odds ratio be the inverse of each other?

Ie if no =0.0007900142 then for yes it is 1/0.0007900142 = 1265.8

PGStats
Opal | Level 21

Seems like you are looking for something like :

proc sql;
create table CHI3 as
select * from
(select "taxId" as source, who, svcd, svdesc, admission, count from CHI)
union all
(select "National", C.who, N.svcd, N.svdesc, N.admission, N.count from
CHI as C inner Join CHI2 as N
on C.svcd=N.svcd and C.svdesc=N.svdesc and C.admission=N.admission)
order by who, source, svcd, admission;

proc freq data=CHI3;
by who;
table ....;
run;

This will generate a lot of output...

PG

PG
PGStats
Opal | Level 21

Given your latest explanations, I think we might be getting closer with:

proc sql;
create table CHI3 as
select * from
(select "taxId" as source, who, svcd, svdesc, admission, count from CHI)
union all
(select "National", C.who, N.svcd, N.svdesc, N.admission, N.count from
CHI as C inner Join CHI2 as N
on C.svcd=N.svcd and C.svdesc=N.svdesc and C.admission=N.admission)
order by who, svcd, source,  admission;
quit;

proc freq data=CHI3 noprint;
by who svcd;
tables source*admission / chisq;
freq count;
output out=CHI_OUT chisq;
run;

PG

PG
tmm
Fluorite | Level 6 tmm
Fluorite | Level 6

The above code is very close but it is duplicating

The output if I just do not even do the proc freq and do a run instead of the quit so I get the table chi3 shows these results:

source                   who                             svcd              svdesc                     admission                     count

prov_tin                1111                            1                    minor                          no                                1

prov_tin                1111                            1                    minor                          no                                1

prov_tin                1111                            1                    minor                          no                                1

national               1111                             1                    minor                         no                              31645

national               1111                             1                    minor                         no                              31645

national               1111                             1                    minor                         no                              31645

I am going to try also throwing in a groupby

tmm
Fluorite | Level 6 tmm
Fluorite | Level 6

WELL GROUPBY DID NOT WORK. THREW AN ERROR

tmm
Fluorite | Level 6 tmm
Fluorite | Level 6

added distinct to the 2 select clauses in the () and that got rid of the dupes.

tmm
Fluorite | Level 6 tmm
Fluorite | Level 6

Well this almost works but it makes the CHI process wrong. What I need is for the output of the code to be

who               svcd                    svdesc                      admission              count

national            1                       minor                             no                     1295

national            1                       minor                            yes                       152

1111                1                       minor                             no                         12

1111                1                       minor                            yes                          3

national            1                       minor                            no                       1295

national            1                       minor                            yes                       152

2222                1                       minor                             no                           24

2222                1                       minor                             yes                           5

With the above code it outputs like this:

source        who                 svcd             svdesc             admission            count

national      1111                  1                minor                  no                    1295

national      1111                  1                minor                 yes                      152

provtin        1111                  1                minor                  no                         12

provtin        1111                  1                minor                 yes                          3

national      2222                  1                minor                  no                     1295

national      2222                  1                minor                 yes                      152

provtin        2222                  1                minor                  no                        24

provtin        2222                  1                minor                 yes                         5

By doing the chi it does not work on my source*admission because it adds everything together. I need the output to be correct so that my chi works

out right

PGStats
Opal | Level 21

Then replace the query by :

proc sql;

create table CHI3 as

select * from

(select who, svcd, svdesc, admission, count from CHI)

union all

(select "National", N.svcd, N.svdesc, N.admission, N.count from

CHI as C inner Join CHI2 as N

on C.svcd=N.svcd and C.svdesc=N.svdesc and C.admission=N.admission)

order by who, svcd, admission;

PG

PG
tmm
Fluorite | Level 6 tmm
Fluorite | Level 6

I did that but then national only appears one time and the chi square calculation does not work. The chi examples of input i have seen have been like this and remember this is simply an example and does not include everything mine does. This is an input example from something someone else did and then they did the chi and it output tons of results

who        admission          count

1111            no                 30

1111           yes                3

ntl              no                   32652

ntl              yes                1480

2222           no                  112

2222          yes                  6

ntl             no                  32652

ntl             yes                1480

3333         no                   50

3333        yes                    2

ntl            no                   32652

ntl             yes                1480

they have the national results below every taxid and then when doing the chi it calculates say for 1111 with count of 30 no's against count of ntl 32652 no's and then count of 3 yes's against the count of the ntl 1480 yes's. That is what I need mine to do but I have svcd and svdesc in mine as well. so mine needs to chi not only against the no and yes of national but break down by svcd and svdesc.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 19 replies
  • 2093 views
  • 0 likes
  • 6 in conversation