Solved: 95% confidence band missing for half of the curve

Saifulinfs · Posted 11-07-2023 10:26 AM

I am trying to visualize my regression with multiple predictors and nonlineraity. Can anybody please explain why the 95% band is missing at higher values of the X axis?

Here is my code:

proc sort data=zinc.imputed
out=zinc.imputed_srtd;
by zns_unic;
run;
proc surveyreg data=zinc.imputed_srtd;
weight nat_weight_bio;
cluster psu_no_ov;
strata newstrata;
class SEX area wi_bin caste edu_mt sanitation handwashing dws eggs pulse_beans milk_products glv nlv p_activity cat_smoke alcohol_y_n cat_crp;
Format SEX SEX. area area. caste caste. edu_mt edu. sanitation $sanitation. handwashing $handwashing. eggs fd2grp. pulse_beans fd2grp. milk_products fd2grp. glv fd2grp. nlv fd2grp. p_activity p_act. cat_crp crp.;
effect znspln = spline(zns_unic / naturalcubic basis=tpf(noint) knotmethod=PERCENTILELIST(5 27.5 50 72.5 95));
model N_Mets5 = znspln AGE_in_yr sex area wi_bin caste edu_mt sanitation handwashing dws eggs pulse_beans milk_products glv nlv p_activity cat_smoke alcohol_y_n cat_crp /solution CLPARM;
output out=N_Mets5_Out predicted=Pred lcl=Lower ucl=Upper; store N_Mets5_MODEL;
run;
proc plm SOURCE = N_Mets5_MODEL; EFFECTPLOT FIT (X=ZNS_UNIC)/ CLM; ODS OUTPUT FITPLOT = N_Mets5_FIT; RUN;

Rick_SAS · Posted 11-08-2023 02:54 PM

I suspect it is an issue with your data that is preventing SAS from evaluating the model when zinc > 85 and the classification variables are at their reference value. But without your data, there isn't much else that I say definitively.

The most logical explanation is that you do not have enough observations in that region of the explanatory variables. But because your model is so large, I don't know which variable you should look at.

What happens if you use zns_unic directly in the model instead of zns_unic? Do you see the same thing? If not, we can look at the spline effect.
Check the sampling weights for the values zns_unic > 85.
I don't know how the CLUSTER and STRATA variables affect the model. Perhaps you or someone else can think about those variables.

Although it would be great to figure out WHY you are seeing this result, the interpretation is clear: your current data do not contain enough information for the model to predict with confidence when zns_unic > 85 and the other variables are at their reference values. I would not use this model past zns_unic = 85. To get a better model will likely require more data that has those extreme values.

View solution in original post

Rick_SAS · Posted 11-07-2023 05:01 PM

I cannot confidently explain what you are seeing, but I can make a guess. My instinct says that it has to do with the relationships between the variables in your data set. Your picture visualizes a slice of the regression surface where zns_unic is allowed to vary over its range, but the other variables (age, sex, area, caste, etc) are fixed at their mean value (if continuous) or their reference values (if classification variable). My theory is that there are very few (possibly only one) observations where zns_unic > 83 and the 17 classification variables have the reference values. (Alternatively, perhaps there are several observations, but the response value is the same.)

If there is no variation in the response, the confidence interval can't be plotted.

You can test my hypothesis: Use a WHERE clause to subset the values of the 17 CLASS variables to their reference values and also zns_unic > 85. Then use

PROC MEANS N NMISS MEAN STD;
var N_Mets5;
run;

on that data set to find the number of nonmissing values of N_Mets5 that satisfy the 18-variable constraints. If the STD is zero, that would explain the graph.

Try it out and report back.

sbxkoenk · Posted 11-07-2023 06:19 PM

Hello,

I have moved this entire topic (original question + replies) to "Statistical Procedures"-board (where it belongs).

proc surveyreg and proc plm are statistical procedures (SAS/Stat).

Question was probably put in "Visual Analytics"-board as it concerns a graph on top of a statistical analysis, but "Visual Analytics" is (the name of) a different SAS product that was not used here by the initiator of this topic.

Koen

Saifulinfs · Posted 11-08-2023 01:42 PM

Thank you for the reply. I removed some class variables from the model. But still, the confidence band is missing from halfway. As you suggested, I looked at the data values using the following code. There are 6 distinct values (total of 7) for the N_Mets5 variable with a std deviation of 0.423.

data subset;
set zinc.imputed;
where zns_unic > 82 & cat_crp = 0 & p_activity <7 & milk_products >2 & dws=2 & sanitation = "Unimproved" & sex=1;
run;
proc print data=subset; var zns_unic n_mets5; run;
PROC MEANS data=subset N NMISS MEAN STD; var N_Mets5; run;

here is the output:

Rick_SAS · Posted 11-08-2023 01:54 PM

I assume that the inequalities
p_activity <7 &
milk_products >2
all result in the same formatted value when you use the P_ACT. and FD2GRP. formats.

Look at the N_Mets5_FIT data set. When ZNS_UNIC > 85, what are the values for the upper and lower limits of the band? Are they missing, or are they the same value as the predicted value (the curve)?

Saifulinfs · Posted 11-08-2023 02:28 PM

Yes you are right, the format results in the same values.

From the N_Metsf5_Fit dataset reveals values for the _LCLM and _UCLM are missing when zns_unic > 82.

Can you suggest what to do in this situation? Is it acceptable to report this figure with such a missing confidence band halfway?

Thank you for your time and advice.

Rick_SAS · Posted 11-08-2023 02:54 PM

I suspect it is an issue with your data that is preventing SAS from evaluating the model when zinc > 85 and the classification variables are at their reference value. But without your data, there isn't much else that I say definitively.

The most logical explanation is that you do not have enough observations in that region of the explanatory variables. But because your model is so large, I don't know which variable you should look at.

What happens if you use zns_unic directly in the model instead of zns_unic? Do you see the same thing? If not, we can look at the spline effect.
Check the sampling weights for the values zns_unic > 85.
I don't know how the CLUSTER and STRATA variables affect the model. Perhaps you or someone else can think about those variables.

Although it would be great to figure out WHY you are seeing this result, the interpretation is clear: your current data do not contain enough information for the model to predict with confidence when zns_unic > 85 and the other variables are at their reference values. I would not use this model past zns_unic = 85. To get a better model will likely require more data that has those extreme values.

95% confidence band missing for half of the curve

Re: 95% confidence band missing for half of the curve

Re: 95% confidence band missing for half of the curve

Re: 95% confidence band missing for half of the curve

Re: 95% confidence band missing for half of the curve

Re: 95% confidence band missing for half of the curve

Re: 95% confidence band missing for half of the curve

Re: 95% confidence band missing for half of the curve