BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
antor82
Obsidian | Level 7

Dear All

 

I thank You in advance for Your kind support.

 

I'm running a bootstrapping after a logistic regression and I would like to print the results. (SAS/STAT 15.1)

 

This is the code

 

/* 2. Generate many bootstrap samples */
proc surveyselect data=dbsname NOPRINT seed=123456
out=Bootout
method=urs 
samprate=1
reps=1000;

%macro ODSOff(); /* Call prior to BY-group processing */
ods graphics off;
ods exclude all;
ods noresults;
%mend;
 
%macro ODSOn(); /* Call after BY-group processing */
ods graphics on;
ods exclude none;
ods results;
%mend;

%ODSOff
PROC LOGISTIC data=Bootout;
    BY Replicate; 
	CLASS Female (param=ref ref='No') ChronicLungDisease (param=ref ref='No');
	MODEL Out2InHospitalOr30DayDeath(event='1')=Female ChronicLungDisease / 
		SELECTION=Backward clodds=pl gof;
	Title 'Logistic Model InHosp or 30d - ONLY preop';
	format Female ChronicLungDisease yn.;
	ods output CLoddsPL=CL_boot_Mort_mod_1;
run;
%ODSon

proc univariate data=cl_boot_mort_mod_1 noprint;
   class Effect;
   var  OddsRatioEst;
   output out=WidePctls1 pctlpre=P_ pctlpts=2.5 97.5 mean=Mean Std=Std; 
run; 


proc print data=WidePctls1 noobs label;
   format Mean Std P_2_5 P_97_5 6.4;
   label Mean="BootMean" Std="BootStdErr" P_2_5="95% Lower CL" P_97_5="95% Upper CL";
run;

I wonder why I get these results

 

Screenshot from DBS CL_Boot_Mort_Mod_1Screenshot from DBS CL_Boot_Mort_Mod_1Screenshot from Output Data WidePctls1Screenshot from Output Data WidePctls1Screenshot from ResultsScreenshot from Results

 

It seems like if "Female Yes vs No" has been categorised into 2 different variables (Female yes vs No and Female         Yes vs No).

This happens also in other models with more independent variables included.

This does not happens in the baseline proc logistic without bootstrapping.

 

I sincerely thank You again for Your kind and precious support

 

Sincerely

 

Antonio

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Thanks for providing the requested outputs. There's nothing wrong with them. So, we've ruled out data and format issues.

 

Hence, it seems that variable Effect in ODS table CLoddsPL shows inconsistencies, but the reason is unclear. I wasn't able to replicate this behavior with SAS/STAT 14.3 (using a different input dataset, of course). I tend to believe that this is a bug (not the first bug I've seen in ODS output datasets), but luckily there's an easy workaround: Most likely the additional blanks between "Female" and "Yes vs No" in some of the Effect values are ordinary space characters, which can be removed with the COMPBL function:

data CL_boot_Mort_mod_1a;
set CL_boot_Mort_mod_1;
effect=compbl(effect);
run;

proc freq data=CL_boot_Mort_mod_1a;
tables effect;
run;

The PROC FREQ step with the revised dataset should show one category "Female Yes vs No" rather than two (and the unchanged category involving ChronicLungDisease). Otherwise you'd need to display the Effect values in $HEXw. format to find out what kind of blanks have been inserted (and use the COMPRESS function with appropriate arguments instead of COMPBL to remove them).

 

Then PROC UNIVARIATE, based on  CL_boot_Mort_mod_1a, will use the consolidated CLASS level as well and the problem is solved.

Again, I think the OUTHITS option in PROC SURVEYSELECT is mandatory in your case to obtain valid bootstrap samples (i.e. with replacement) because you don't use variable NumberHits (of dataset Bootout) in the subsequent steps.

View solution in original post

9 REPLIES 9
FreelanceReinh
Jade | Level 19

Hello @antor82 and welcome to the SAS Support Communities!

 

It looks like variable Female has more distinct values than expected. So, my first check would be:

proc freq data=dbsname;
format female hex16.;
tables female;
run;

Please post the output of the above step.

 

Since you're using formatted values of this variable, we should take a look at the definition of format YN. Can you show the SAS code which created that format or, if the code is not readily available, the output of the step below?

proc format lib=work fmtlib; /* Please replace "work" by the appropriate libref */
select yn;            /* (or libref.catalogname) if YN. is not in WORK.FORMATS. */
run;

 

Also, are you sure you don't need the OUTHITS option in your PROC SURVEYSELECT step? This is unrelated to the problem you've reported, but has the potential to invalidate your results.

 

[Edit: Included link to the documentation of OUTHITS.]

antor82
Obsidian | Level 7

Hi 

 

Thank You for Your comment.

 

This is the output from two different proc surveyselect

It looks like variable Female has more distinct values than expected. So, my first check would be:

proc freq data=dbsname;
format female hex16.;
tables female;
run;

Please post the output of the above step.

Screenshot 2019-07-01 at 13.38.15.png

 

Since you're using formatted values of this variable, we should take a look at the definition of format YN. Can you show the SAS code which created that format or, if the code is not readily available, the output of the step below?

proc format lib=work fmtlib; /* Please replace "work" by the appropriate libref */
select yn;            /* (or libref.catalogname) if YN. is not in WORK.FORMATS. */
run;

 Female is defined 1=yes and 0=no (this format is used also for other binary variables).

 

this is the output of the requested proc format

 

Screenshot 2019-07-01 at 13.42.39.png

 

Tks again

 

A

FreelanceReinh
Jade | Level 19

Thanks for providing the requested outputs. There's nothing wrong with them. So, we've ruled out data and format issues.

 

Hence, it seems that variable Effect in ODS table CLoddsPL shows inconsistencies, but the reason is unclear. I wasn't able to replicate this behavior with SAS/STAT 14.3 (using a different input dataset, of course). I tend to believe that this is a bug (not the first bug I've seen in ODS output datasets), but luckily there's an easy workaround: Most likely the additional blanks between "Female" and "Yes vs No" in some of the Effect values are ordinary space characters, which can be removed with the COMPBL function:

data CL_boot_Mort_mod_1a;
set CL_boot_Mort_mod_1;
effect=compbl(effect);
run;

proc freq data=CL_boot_Mort_mod_1a;
tables effect;
run;

The PROC FREQ step with the revised dataset should show one category "Female Yes vs No" rather than two (and the unchanged category involving ChronicLungDisease). Otherwise you'd need to display the Effect values in $HEXw. format to find out what kind of blanks have been inserted (and use the COMPRESS function with appropriate arguments instead of COMPBL to remove them).

 

Then PROC UNIVARIATE, based on  CL_boot_Mort_mod_1a, will use the consolidated CLASS level as well and the problem is solved.

Again, I think the OUTHITS option in PROC SURVEYSELECT is mandatory in your case to obtain valid bootstrap samples (i.e. with replacement) because you don't use variable NumberHits (of dataset Bootout) in the subsequent steps.

antor82
Obsidian | Level 7

Thank You 

 

 

 

 

FreelanceReinh
Jade | Level 19

You're welcome. I had one more idea while I wasn't able to access the SAS website for a while:

You may want to make sure that the unexpected discrepancies between Effect values did not occur within a replicate. (This is unlikely, but it would possibly indicate a more serious issue.) This would lead to duplicate Replicate-Effect combinations in the revised dataset CL_boot_Mort_mod_1a. So, if the PROC SQL step below created a non-empty dataset MYST, we should be alarmed.

proc sql;
create table myst as
select * from CL_boot_Mort_mod_1a
group by replicate, effect
having count(*)>1;
quit;

But most likely it will result in:

NOTE: Table WORK.MYST created, with 0 rows and 6 columns.

 

antor82
Obsidian | Level 7
So it did
antor82
Obsidian | Level 7

In my analysis, I've run three different logistic regression models (1-only baseline variables; 2-baseline+procedure-related variables; 3-baseline+procedure-related+postoperative-complications). Then do a bootstrap resampling.

 

How is it possible to have such results? (I'm posting only some examples)...

 

Model 2

Female Yes vs No     OR 5.1           95%CI 2.6-12.2  (OR similar to Model 1)

Procedure                 OR 49085.6   95%CI 6.1-75.4  (such big OR???? OR greater than upper 95%CL????)

 

 

Model 3

Female Yes vs No     OR 10.1           95%CI 3.1-40.5  (OR so far from Model 2?????)

FreelanceReinh
Jade | Level 19

My first step would always be univariate logistic regressions (or, in the case of categorical predictors, contingency table analyses) to select candidate variables for a multivariable model.

 

Adjusting for other variables can change the odds ratio for a predictor considerably.

 

The extremely large OR requires further investigation (see also suspicious log messages, e.g., "quasi-complete separation of data points"). I think with clodds=wald the point estimate would always be within the confidence limits. If Procedure is a continuous variable, the OR depends on the measurement unit (cf. UNITS statement). I'd take a look at the joint distribution of this and the dependent variable.

antor82
Obsidian | Level 7

My first step would always be univariate logistic regressions (or, in the case of categorical predictors, contingency table analyses) to select candidate variables for a multivariable model.

 

Already done. Only significantly associated variables have been included in the models.

 

"quasi-complete separation of data points"

Yes, it happens. I'm trying to solve this with penalised regression models (firth options in the model statement).

However, some variables have OR <0.0001 or >999.999. (less frequently with firth option, but present anyway....) This greatly influence my models.

I guess I would probably better redefine the variables included in the models to avoid separation.

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 3167 views
  • 2 likes
  • 2 in conversation