chisq issue

tmm · Posted 03-20-2012 05:12 PM

Everything I have seen on trying to do chisq in SAS looks like this

who scd sdesc admission count

1111 1 major noreadm 200

1111 1 major yesredm 2

1111 2 extreme noreadm 100

1111 2 extreme yesreadm 15

national 1 major noreadm 500

national 1 major yesreadm 10

national 2 extreme noreadm 250

national 2 extreme yesreadm 25

2222 1 major noreadm 300

2222 1 major yesreadm 8

2222 2 extreme noreadm 150

2222 2 extreme yesreadm 10

I have 87k rows all together and after every number in the who column there is another national series with the same counts.

Then code is run and you get separate chi data based on the above.

Now I have 2 tables. 1 that has multiple rows based on the who number which are TIN numbers by the way. That is in my CHI table. Then I have a CHI3 table that took the national and pieced it out and then I summed it to get a table that is named CHI3. CHI has 87k rows with all the TINS that apply. CHI3 is just 4 rows total and I appended it to the CHI tables but now it does not look like the other chi above where the national appears after every list of TINS. I tried to run the CHIsq code off what I appended but it does not work. Is there just a way I can run the chisq by doing a join to the 2 tables? I hope I am not talking in circles and confusing everyone. I can offer more info if need be.

art297 · Posted 03-20-2012 06:45 PM

I, for one, am totally confused. I don't know what code you are running, on which data, or what results you expect to achieve.

PGStats · Posted 03-20-2012 10:13 PM

What are you trying to test with the ChiSq?

PG

tmm · Posted 03-21-2012 12:16 PM

Let me try it this way. I have a table called CHI. The columns are who, svcd, svdesc, admission, count. The data in the who is a taxid and each taxid is logged 8 times because the svcd is 1-4. The svdesc is n/a, minor, major, extreme. the admission is no or yes and the count is the count of no or yes

The above table has a total of 21349 taxids represented.

Then I have table CHI2 which is the national representation of the above table. That is simply taking all the data and doing a sum of the count and then groupby the other stuff. The columns are the same however it is laid out like this:

who svcd svdesc admission count

national 1 n/a no 2990

national 1 n/a yes 5

national 2 minor no 31645

national 2 minor yes 52

national 3 major no 41864

national 3 major yes 154

national 4 extreme no 7589

national 4 extreme yes 19

I need to do a CHIsq off this data and all examples I see the first 8 rows of the data is the taxid then followed by the above 8 rows. then next 8 rows are next taxid followed again by the above 8 rows and so on. Then the chisq code is run to get chi results for each tin and each svcd and admission.

The problem is this means I would need to export both the CHI and CHI2 tables and manually add this above national data after every tax id series. That seems like far too much work. Is there a way maybe I can do an inner join on the 2 tables and tell it with code to do chisq for me?

Reeza · Posted 03-21-2012 12:36 PM

What would your hypothesis be?

Are you looking for each who to match the national or are you testing for differences between the who's?

What variables are you to test for differences between?

tmm · Posted 03-21-2012 12:49 PM

Say the taxid with svcd is 2 which is minor and they have a count of 25 no admissions and then 2 yes admissions. The chisq would be of the national with the 2 as the svcd and the minor and as above national for that is 31645 no admissions and 52 yes admissions. So if I remember my chisq correctly 25/31645 and 2/52 shows the ratio. And using the chisq for cochran mantel with those no and yes counts i would get this as the result:

case control value = 6205.0000 95% confidence limit = 290.6214

col1 risk value = 5651.0714 95% confidence limit = 277.5314

cohort value = 0.9259 95% confidence limit = 0.8322

odds ratio is 0.0007900142 for no with svcd 2 and 0.0384615385 for yes with svcd 2

I manually did this based on my first group of TINS and the national data and manually did an input of the data and then ran the chisq and ratio code. But I am not about to manually input the national data and insert it after every 8 rows in the CHI main table that houses all the taxids. That would take forever. So all I am hoping for is to some how inner join the 2 tables and then run my chisq and ratio code and it will know that the first 8 rows of Taxid data compare to the national data and give me the results. Now take the next 8 rows of taxid data and do the same until all taxids are calculated. Otherwise the only thing I can see is exporting my 2 tables to Excel and doing the chisq and ratios in Excel but that is not what the boss wants. Wants it done in SAS but I do not know how unless I export both files to Excel and manually add the national data after every 8 taxid rows.

Reeza · Posted 03-21-2012 01:04 PM

Shouldn't your odds ratio be the inverse of each other?

Ie if no =0.0007900142 then for yes it is 1/0.0007900142 = 1265.8

PGStats · Posted 03-21-2012 12:37 PM

Seems like you are looking for something like :

proc sql;
create table CHI3 as
select * from
(select "taxId" as source, who, svcd, svdesc, admission, count from CHI)
union all
(select "National", C.who, N.svcd, N.svdesc, N.admission, N.count from
CHI as C inner Join CHI2 as N
on C.svcd=N.svcd and C.svdesc=N.svdesc and C.admission=N.admission)
order by who, source, svcd, admission;

proc freq data=CHI3;
by who;
table ....;
run;

This will generate a lot of output...

PG

PGStats · Posted 03-21-2012 01:49 PM

Given your latest explanations, I think we might be getting closer with:

proc sql;
create table CHI3 as
select * from
(select "taxId" as source, who, svcd, svdesc, admission, count from CHI)
union all
(select "National", C.who, N.svcd, N.svdesc, N.admission, N.count from
CHI as C inner Join CHI2 as N
on C.svcd=N.svcd and C.svdesc=N.svdesc and C.admission=N.admission)
order by who, svcd, source, admission;
quit;

proc freq data=CHI3 noprint;
by who svcd;
tables source*admission / chisq;
freq count;
output out=CHI_OUT chisq;
run;

PG

tmm · Posted 03-21-2012 02:03 PM

The above code is very close but it is duplicating

The output if I just do not even do the proc freq and do a run instead of the quit so I get the table chi3 shows these results:

source who svcd svdesc admission count

prov_tin 1111 1 minor no 1

national 1111 1 minor no 31645

I am going to try also throwing in a groupby

tmm · Posted 03-21-2012 02:04 PM

WELL GROUPBY DID NOT WORK. THREW AN ERROR

tmm · Posted 03-21-2012 02:08 PM

added distinct to the 2 select clauses in the () and that got rid of the dupes.

tmm · Posted 03-27-2012 03:53 PM

Well this almost works but it makes the CHI process wrong. What I need is for the output of the code to be

who svcd svdesc admission count

national 1 minor no 1295

national 1 minor yes 152

1111 1 minor no 12

1111 1 minor yes 3

national 1 minor no 1295

national 1 minor yes 152

2222 1 minor no 24

2222 1 minor yes 5

With the above code it outputs like this:

source who svcd svdesc admission count

national 1111 1 minor no 1295

national 1111 1 minor yes 152

provtin 1111 1 minor no 12

provtin 1111 1 minor yes 3

national 2222 1 minor no 1295

national 2222 1 minor yes 152

provtin 2222 1 minor no 24

provtin 2222 1 minor yes 5

By doing the chi it does not work on my source*admission because it adds everything together. I need the output to be correct so that my chi works

out right

PGStats · Posted 03-27-2012 05:27 PM

Then replace the query by :

proc sql;

create table CHI3 as

select * from

(select who, svcd, svdesc, admission, count from CHI)

union all

(select "National", N.svcd, N.svdesc, N.admission, N.count from

CHI as C inner Join CHI2 as N

on C.svcd=N.svcd and C.svdesc=N.svdesc and C.admission=N.admission)

order by who, svcd, admission;

PG

tmm · Posted 03-28-2012 08:15 AM

I did that but then national only appears one time and the chi square calculation does not work. The chi examples of input i have seen have been like this and remember this is simply an example and does not include everything mine does. This is an input example from something someone else did and then they did the chi and it output tons of results

who admission count

1111 no 30

1111 yes 3

ntl no 32652

ntl yes 1480

2222 no 112

2222 yes 6

ntl no 32652

ntl yes 1480

3333 no 50

3333 yes 2

ntl no 32652

ntl yes 1480

they have the national results below every taxid and then when doing the chi it calculates say for 1111 with count of 30 no's against count of ntl 32652 no's and then count of 3 yes's against the count of the ntl 1480 yes's. That is what I need mine to do but I have svcd and svdesc in mine as well. so mine needs to chi not only against the no and yes of national but break down by svcd and svdesc.

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away