Hello,
For instance, in SASHELP.CARS, if I want to see if the Origin and the Type of the cars are independent, I can test for independence between these two variables, using :
PROC FREQ data=sashelp.cars;
tables (origin)*type / chisq ;
run;
Then, I get a Chi-square test value at 35,66 and a p-value<0.0001 ; so, I can reject the null hypotesis for the globality of the frequency table. But, what happens if I want to do this for each item of Origin; not only to see if there is a broad dependence, but to test the independence for Asia, then Europe, then USA and so on ? Is there a better way that making the following code for each region ?
PROC FREQ data=sashelp.cars; where origin in ('Asia', 'Europe'); tables (origin)*type / chisq ; run;
Is there an option to have the Chi-square test for each item of the PROF FREQ, not only for the whole?
Thanks!
Hello @commitsudoku,
You could create a suitable BY variable in a view or dataset so that each value of that variable corresponds to one pair of Origin values.
Simple example:
proc sql;
create view want as
select 1 as c, origin, type from sashelp.cars where origin ne 'USA'
union all
select 2 as c, origin, type from sashelp.cars where origin ne 'Europe'
union all
select 3 as c, origin, type from sashelp.cars where origin ne 'Asia';
quit;
proc freq data=want;
by c;
tables origin*type / chisq;
run;
It should be possible to create the view (maybe as a DATA step view) without using hardcoded values of Origin and c.
Addendum
Here is one way avoiding hardcoded values:
%let dsn=sashelp.cars;
%let var1=origin;
%let var2=type;
data want(keep=_byval1 _byval2 &var1 &var2 _count) / view=want;
if _n_=1 then do;
dcl hash h();
h.definekey("&var1", "&var2");
h.definedata("&var1", "&var2", '_count');
h.definedone();
dcl hiter hi('h');
dcl hash k(dataset:"&dsn", ordered:'a');
k.definekey("&var1");
k.definedone();
dcl hiter hi1('k');
dcl hiter hi2('k');
end;
set &dsn end=last;
if h.find()=0 then _count+1;
else _count=1;
h.replace();
if last;
do while(hi1.next()=0);
_byval1=&var1;
hi2.setcur();
do while(hi2.next()=0);
_byval2=&var1;
do while(hi.next()=0);
if &var1=_byval1 | &var1=_byval2 then output;
end;
end;
end;
run;
proc freq data=want;
by _byval1 _byval2;
weight _count;
tables &var1*&var2 / chisq /* missing */;
run;
For an example with ten BY groups use these settings:
%let var1=DeathCause;
%let var2=Smoking_Status;
%let dsn=sashelp.heart(where=(~missing(&var1)));
First sort the data by origin. Then
PROC FREQ data=sashelp.cars;
by origin;
tables origin*type / chisq;
run;
@commitsudoku wrote:
Well it's more or less the idea, but still I don't get a result for the Chi-squre, because the PROC FREQ doesn't put an Origin vs all the others. With the BY statement, I only get a table for that Origin and the Khi-square test cannot the applied.
Yes, you can't get that from one run of PROC FREQ. You would need two runs of PROC FREQ to get that. Or perhaps you can coerce PROC CATMOD to do this, but I leave that to others (if indeed it is possible).
Hello @commitsudoku,
You could create a suitable BY variable in a view or dataset so that each value of that variable corresponds to one pair of Origin values.
Simple example:
proc sql;
create view want as
select 1 as c, origin, type from sashelp.cars where origin ne 'USA'
union all
select 2 as c, origin, type from sashelp.cars where origin ne 'Europe'
union all
select 3 as c, origin, type from sashelp.cars where origin ne 'Asia';
quit;
proc freq data=want;
by c;
tables origin*type / chisq;
run;
It should be possible to create the view (maybe as a DATA step view) without using hardcoded values of Origin and c.
Addendum
Here is one way avoiding hardcoded values:
%let dsn=sashelp.cars;
%let var1=origin;
%let var2=type;
data want(keep=_byval1 _byval2 &var1 &var2 _count) / view=want;
if _n_=1 then do;
dcl hash h();
h.definekey("&var1", "&var2");
h.definedata("&var1", "&var2", '_count');
h.definedone();
dcl hiter hi('h');
dcl hash k(dataset:"&dsn", ordered:'a');
k.definekey("&var1");
k.definedone();
dcl hiter hi1('k');
dcl hiter hi2('k');
end;
set &dsn end=last;
if h.find()=0 then _count+1;
else _count=1;
h.replace();
if last;
do while(hi1.next()=0);
_byval1=&var1;
hi2.setcur();
do while(hi2.next()=0);
_byval2=&var1;
do while(hi.next()=0);
if &var1=_byval1 | &var1=_byval2 then output;
end;
end;
end;
run;
proc freq data=want;
by _byval1 _byval2;
weight _count;
tables &var1*&var2 / chisq /* missing */;
run;
For an example with ten BY groups use these settings:
%let var1=DeathCause;
%let var2=Smoking_Status;
%let dsn=sashelp.heart(where=(~missing(&var1)));
Are you talking about one-way chisq test.
proc freq data=sashelp.cars; where origin='Asia'; table type/chisq(testp=(0.1 0.1 0.60 0.1 0.05 0.05 )); run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.