Everything I have seen on trying to do chisq in SAS looks like this
who scd sdesc admission count
1111 1 major noreadm 200
1111 1 major yesredm 2
1111 2 extreme noreadm 100
1111 2 extreme yesreadm 15
national 1 major noreadm 500
national 1 major yesreadm 10
national 2 extreme noreadm 250
national 2 extreme yesreadm 25
2222 1 major noreadm 300
2222 1 major yesreadm 8
2222 2 extreme noreadm 150
2222 2 extreme yesreadm 10
I have 87k rows all together and after every number in the who column there is another national series with the same counts.
Then code is run and you get separate chi data based on the above.
Now I have 2 tables. 1 that has multiple rows based on the who number which are TIN numbers by the way. That is in my CHI table. Then I have a CHI3 table that took the national and pieced it out and then I summed it to get a table that is named CHI3. CHI has 87k rows with all the TINS that apply. CHI3 is just 4 rows total and I appended it to the CHI tables but now it does not look like the other chi above where the national appears after every list of TINS. I tried to run the CHIsq code off what I appended but it does not work. Is there just a way I can run the chisq by doing a join to the 2 tables? I hope I am not talking in circles and confusing everyone. I can offer more info if need be.
I, for one, am totally confused. I don't know what code you are running, on which data, or what results you expect to achieve.
What are you trying to test with the ChiSq?
Let me try it this way. I have a table called CHI. The columns are who, svcd, svdesc, admission, count. The data in the who is a taxid and each taxid is logged 8 times because the svcd is 1-4. The svdesc is n/a, minor, major, extreme. the admission is no or yes and the count is the count of no or yes
The above table has a total of 21349 taxids represented.
Then I have table CHI2 which is the national representation of the above table. That is simply taking all the data and doing a sum of the count and then groupby the other stuff. The columns are the same however it is laid out like this:
who svcd svdesc admission count
national 1 n/a no 2990
national 1 n/a yes 5
national 2 minor no 31645
national 2 minor yes 52
national 3 major no 41864
national 3 major yes 154
national 4 extreme no 7589
national 4 extreme yes 19
I need to do a CHIsq off this data and all examples I see the first 8 rows of the data is the taxid then followed by the above 8 rows. then next 8 rows are next taxid followed again by the above 8 rows and so on. Then the chisq code is run to get chi results for each tin and each svcd and admission.
The problem is this means I would need to export both the CHI and CHI2 tables and manually add this above national data after every tax id series. That seems like far too much work. Is there a way maybe I can do an inner join on the 2 tables and tell it with code to do chisq for me?
What would your hypothesis be?
Are you looking for each who to match the national or are you testing for differences between the who's?
What variables are you to test for differences between?
Say the taxid with svcd is 2 which is minor and they have a count of 25 no admissions and then 2 yes admissions. The chisq would be of the national with the 2 as the svcd and the minor and as above national for that is 31645 no admissions and 52 yes admissions. So if I remember my chisq correctly 25/31645 and 2/52 shows the ratio. And using the chisq for cochran mantel with those no and yes counts i would get this as the result:
case control value = 6205.0000 95% confidence limit = 290.6214
col1 risk value = 5651.0714 95% confidence limit = 277.5314
cohort value = 0.9259 95% confidence limit = 0.8322
odds ratio is 0.0007900142 for no with svcd 2 and 0.0384615385 for yes with svcd 2
I manually did this based on my first group of TINS and the national data and manually did an input of the data and then ran the chisq and ratio code. But I am not about to manually input the national data and insert it after every 8 rows in the CHI main table that houses all the taxids. That would take forever. So all I am hoping for is to some how inner join the 2 tables and then run my chisq and ratio code and it will know that the first 8 rows of Taxid data compare to the national data and give me the results. Now take the next 8 rows of taxid data and do the same until all taxids are calculated. Otherwise the only thing I can see is exporting my 2 tables to Excel and doing the chisq and ratios in Excel but that is not what the boss wants. Wants it done in SAS but I do not know how unless I export both files to Excel and manually add the national data after every 8 taxid rows.
Shouldn't your odds ratio be the inverse of each other?
Ie if no =0.0007900142 then for yes it is 1/0.0007900142 = 1265.8
Seems like you are looking for something like :
proc sql;
create table CHI3 as
select * from
(select "taxId" as source, who, svcd, svdesc, admission, count from CHI)
union all
(select "National", C.who, N.svcd, N.svdesc, N.admission, N.count from
CHI as C inner Join CHI2 as N
on C.svcd=N.svcd and C.svdesc=N.svdesc and C.admission=N.admission)
order by who, source, svcd, admission;
proc freq data=CHI3;
by who;
table ....;
run;
This will generate a lot of output...
PG
Given your latest explanations, I think we might be getting closer with:
proc sql;
create table CHI3 as
select * from
(select "taxId" as source, who, svcd, svdesc, admission, count from CHI)
union all
(select "National", C.who, N.svcd, N.svdesc, N.admission, N.count from
CHI as C inner Join CHI2 as N
on C.svcd=N.svcd and C.svdesc=N.svdesc and C.admission=N.admission)
order by who, svcd, source, admission;
quit;
proc freq data=CHI3 noprint;
by who svcd;
tables source*admission / chisq;
freq count;
output out=CHI_OUT chisq;
run;
PG
The above code is very close but it is duplicating
The output if I just do not even do the proc freq and do a run instead of the quit so I get the table chi3 shows these results:
source who svcd svdesc admission count
prov_tin 1111 1 minor no 1
prov_tin 1111 1 minor no 1
prov_tin 1111 1 minor no 1
national 1111 1 minor no 31645
national 1111 1 minor no 31645
national 1111 1 minor no 31645
I am going to try also throwing in a groupby
WELL GROUPBY DID NOT WORK. THREW AN ERROR
added distinct to the 2 select clauses in the () and that got rid of the dupes.
Well this almost works but it makes the CHI process wrong. What I need is for the output of the code to be
who svcd svdesc admission count
national 1 minor no 1295
national 1 minor yes 152
1111 1 minor no 12
1111 1 minor yes 3
national 1 minor no 1295
national 1 minor yes 152
2222 1 minor no 24
2222 1 minor yes 5
With the above code it outputs like this:
source who svcd svdesc admission count
national 1111 1 minor no 1295
national 1111 1 minor yes 152
provtin 1111 1 minor no 12
provtin 1111 1 minor yes 3
national 2222 1 minor no 1295
national 2222 1 minor yes 152
provtin 2222 1 minor no 24
provtin 2222 1 minor yes 5
By doing the chi it does not work on my source*admission because it adds everything together. I need the output to be correct so that my chi works
out right
Then replace the query by :
proc sql;
create table CHI3 as
select * from
(select who, svcd, svdesc, admission, count from CHI)
union all
(select "National", N.svcd, N.svdesc, N.admission, N.count from
CHI as C inner Join CHI2 as N
on C.svcd=N.svcd and C.svdesc=N.svdesc and C.admission=N.admission)
order by who, svcd, admission;
PG
I did that but then national only appears one time and the chi square calculation does not work. The chi examples of input i have seen have been like this and remember this is simply an example and does not include everything mine does. This is an input example from something someone else did and then they did the chi and it output tons of results
who admission count
1111 no 30
1111 yes 3
ntl no 32652
ntl yes 1480
2222 no 112
2222 yes 6
ntl no 32652
ntl yes 1480
3333 no 50
3333 yes 2
ntl no 32652
ntl yes 1480
they have the national results below every taxid and then when doing the chi it calculates say for 1111 with count of 30 no's against count of ntl 32652 no's and then count of 3 yes's against the count of the ntl 1480 yes's. That is what I need mine to do but I have svcd and svdesc in mine as well. so mine needs to chi not only against the no and yes of national but break down by svcd and svdesc.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.