Identify overlapping CLs for each level of a categorical variable?

SAS93 · Posted 07-14-2022 01:08 PM

I'm using complex survey data to assess various outcomes, stratified by various demographic variables. I'm running Chi-Square tests to assess IDed significant differences, but I want an easier way to find 95% CLs that overlap within those groups (as opposed to scrolling through 100s of output tables and writing each one down), without any additional testing (at this point).

I've output the crosstabulated tables for all variables. I've formatted each of these mini datasets. I want to code something that basically flags the CLS that overlap between groups; I realize this is a bit more complicated than I'm making it out to be. But how can I create a code that says "for each level of this categorical variable, compare the lower and upper CLs to find overlaps". The demographic variables have different numbers of categories, so I'd like to use a macro and Proc Freq NLEVELS to pass the number of levels (and thus automate it).

PaigeMiller · Posted 07-14-2022 01:15 PM

Give us some idea of the code used to determine the confidence intervals. Give us some idea about why you have 100s of outputs.

Also, please note that if you are looking for significant differences ("I'm running Chi-Square tests to assess IDed significant differences") then confidence intervals that overlap/do not overlap is not always the same as testing for significant differences.

--
Paige Miller

ballardw · Posted 07-14-2022 02:07 PM

Provide an example of one of your "mini datasets".

With output data I had created my approach for a similar problem was a self-join on a data set by the categories of interest comparing the upper and lower confidence limits and setting values for less than or greater than intervals and adding some variables for table control. Then displayed the result with Proc Tabulate as a grid with < or > for the non-overlap and direction.

Make the data set:

Proc sql;
   create table EthDiff as
   select a.HealthDistrict,a.number,a.Average,a.LowerCLMean,a.UpperCLMean
          , catx(' - ',a.ResponseGroup,a.VarLabel) as TableHeader
        , case
            when a.HealthDistrict = b.HealthDistrict and a.z_hispanic=b.z_hispanic then .
            when a.number <30 or b.number<30 then .
            when missing(a.LowerCLMean) or missing(b.LowerCLMean) then -9
            when a.LowerCLMean > b.UpperCLMean then 1
            when a.UpperCLMean < b.LowerCLMean then -1
            else 0
         end as StatDiff ,
         b.HealthDistrict as ComparisonDistrict label='Comparison District',
         a.z_hispanic,
         b.z_hispanic as ComparisonEthnic label='Comparison Ethnicity'
   from smiletestEthnic as a left join smiletestEthnic as b on
        a.varname=b.varname and a.varlevel=b.varlevel
   order by a.varname,a.varlevel,a.HealthDistrict,ComparisonDistrict;
quit;

My data had multiple analysis variables, hence the VARNAME and VarLevel variables. The input set basically used the HealthDistrict (geography) and Z_hispanic (major demographic group) as the report grouping and then creates the indicators for the analysis variables. The case clause did the comparisons of the limits. The Comparison variables are to allow column/row combinations of those in a report.

The report

Title 'District and Ethnicity Summary';
proc tabulate data=EthDiff;

   class TableHeader;
   classlev tableheader / style=[width=1.25in];
   class HealthDistrict ComparisonDistrict / missing ;
   class z_hispanic ComparisonEthnic/missing;
   format z_hispanic ComparisonEthnic hispanic.;
   classlev ComparisonDistrict / style=[width=.8in];
   format HealthDistrict ComparisonDistrict DistShort.;
   var number StatDiff Average LowerCLMean UpperCLMean;
   table TableHeader='',
      HealthDistrict=''*z_hispanic='',
      Number='n'*max=''*f=f5.0 (Average='Rate' LowerCLMean='LCL' UpperCLMean='UCL')*max=''*f=5.1
      ComparisonDistrict*ComparisonEthnic='' * Statdiff=''*max=''*{style=[just=C]}*f=Statdiff.
      /misstext=' ' printmiss box=HealthDistrict
       style=[Posttext="> means rate for row header is statistically significant and larger than for column header;
 < means smaller"];
run;

The format STATDIFF displays values only when of interest:

Proc format;
value StatDiff
-1 = '<'
 1 = '>'
other=' '
;
run;

The output provided tables where it was easy to see where the differences occurred. I would assume it could be modified to only include the differences but I wanted to see all the levels of the analysis variables. My data was relatively sparse in differences.

No attempt was made to "macrotize" this there were just enough differences for combinations of the grouping variables I didn't see a need to try to write around those. I actually had 1 (geography only), 2 (shown) and 3 level nestings for the groups such as sex within race and race within sex coupled with the geography

Identify overlapping CLs for each level of a categorical variable?

Re: Identify overlapping CLs for each level of a categorical variable?

Re: Identify overlapping CLs for each level of a categorical variable?

Identify overlapping CLs for each level of a categorical variable?

Re: Identify overlapping CLs for each level of a categorical variable?

Re: Identify overlapping CLs for each level of a categorical variable?

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!