Provide an example of one of your "mini datasets".
With output data I had created my approach for a similar problem was a self-join on a data set by the categories of interest comparing the upper and lower confidence limits and setting values for less than or greater than intervals and adding some variables for table control. Then displayed the result with Proc Tabulate as a grid with < or > for the non-overlap and direction.
Make the data set:
Proc sql;
create table EthDiff as
select a.HealthDistrict,a.number,a.Average,a.LowerCLMean,a.UpperCLMean
, catx(' - ',a.ResponseGroup,a.VarLabel) as TableHeader
, case
when a.HealthDistrict = b.HealthDistrict and a.z_hispanic=b.z_hispanic then .
when a.number <30 or b.number<30 then .
when missing(a.LowerCLMean) or missing(b.LowerCLMean) then -9
when a.LowerCLMean > b.UpperCLMean then 1
when a.UpperCLMean < b.LowerCLMean then -1
else 0
end as StatDiff ,
b.HealthDistrict as ComparisonDistrict label='Comparison District',
a.z_hispanic,
b.z_hispanic as ComparisonEthnic label='Comparison Ethnicity'
from smiletestEthnic as a left join smiletestEthnic as b on
a.varname=b.varname and a.varlevel=b.varlevel
order by a.varname,a.varlevel,a.HealthDistrict,ComparisonDistrict;
quit;
My data had multiple analysis variables, hence the VARNAME and VarLevel variables. The input set basically used the HealthDistrict (geography) and Z_hispanic (major demographic group) as the report grouping and then creates the indicators for the analysis variables. The case clause did the comparisons of the limits. The Comparison variables are to allow column/row combinations of those in a report.
The report
Title 'District and Ethnicity Summary';
proc tabulate data=EthDiff;
class TableHeader;
classlev tableheader / style=[width=1.25in];
class HealthDistrict ComparisonDistrict / missing ;
class z_hispanic ComparisonEthnic/missing;
format z_hispanic ComparisonEthnic hispanic.;
classlev ComparisonDistrict / style=[width=.8in];
format HealthDistrict ComparisonDistrict DistShort.;
var number StatDiff Average LowerCLMean UpperCLMean;
table TableHeader='',
HealthDistrict=''*z_hispanic='',
Number='n'*max=''*f=f5.0 (Average='Rate' LowerCLMean='LCL' UpperCLMean='UCL')*max=''*f=5.1
ComparisonDistrict*ComparisonEthnic='' * Statdiff=''*max=''*{style=[just=C]}*f=Statdiff.
/misstext=' ' printmiss box=HealthDistrict
style=[Posttext="> means rate for row header is statistically significant and larger than for column header;
< means smaller"];
run;
The format STATDIFF displays values only when of interest:
Proc format;
value StatDiff
-1 = '<'
1 = '>'
other=' '
;
run;
The output provided tables where it was easy to see where the differences occurred. I would assume it could be modified to only include the differences but I wanted to see all the levels of the analysis variables. My data was relatively sparse in differences.
No attempt was made to "macrotize" this there were just enough differences for combinations of the grouping variables I didn't see a need to try to write around those. I actually had 1 (geography only), 2 (shown) and 3 level nestings for the groups such as sex within race and race within sex coupled with the geography
... View more