Hi,
I have a problem with the result of proc logistic. I use selection= backward in proc logistic and use ods output to see my result. After that I select those var whose pvalue is less than 0.05 and return the result by using outest= in proc logistic. However, the coefficients of the same variables in "paraest" and "betas" are different.
Could someone helps me to figure it out?
Thanks a lot
Here is my code:
ods output ParameterEstimates=paraest;
proc logistic data=f.train2 desc namelen=40 ;
class &class_var;
model ksi= &class_var &reduced/selection=backward fast slstay=.001;
run;
ods output close;
%put &class_var &reduced;
data f.paraest;
set paraest;
where ProbChiSq<.05;
run;
proc sql;
select distinct variable into:selected separated by " "
from f.paraest;
quit;
proc sql noprint;
select distinct variable into:char separated by " "
from f.paraest a
where a.variable in (select name from dictionary.columns
where upcase(libname)="F" and
upcase(memname)="TRAIN2" and
upcase(type)="CHAR");
quit;
%put &selected;
%put &char;
proc logistic data=f.train2 desc outest=betas namelen=40 ;
class &char;
model ksi= &selected/selection=backward slstay=.001;
run;
No idea why you would use the DATA step to select parameters with p<.05 after requiring them to have p<.001 in the LOGISTIC run, but ignoring that, if there are any missing values in any of the unselected candidate variables from the first LOGISTIC run, then the set of observations used in the first run is not the same as the set of observations used in the second run and therefore differences are to be expected. In the first run, any observation that has a missing value on any of the specified variables will be omitted.
Thanks for your reply. However, the f.train2 dataset is kind of special---- missing value is represented by -1, so I think it's not because of this problem?
My bad, I didn't express clearly. Missing value is represented by "-1" in char var.
Yes, your assumption is correct. That's what I mean. For the numeric variables, there are no missing value. And I tried to add FAST, but the result still looks different from the one in the first Proc Logistic output
How different?
Can you replicate the problem using a data set from the SAS sample documentation?
If not, it's your data.
If you can, post the code/log (since data is public) and we can help from there.
Thanks for your reply. I think I have solved the problem.
Thank you. I think I solve the problem by deleting some variables which are constants(I just checked the log. I made some missing indicators and there are no missing value for those var. That's my mistake).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.