BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
heloiee
Calcite | Level 5

Hi,

I have a problem with the result of proc logistic. I use selection= backward in proc logistic and use ods output to see my result. After that I select those var whose pvalue is less than 0.05 and return the result by using outest= in proc logistic. However,  the coefficients of the same variables in "paraest" and "betas" are different.

 

Could someone helps me to figure it out?

 

Thanks a lot

 

Here is my code:

ods output ParameterEstimates=paraest;
proc logistic data=f.train2 desc namelen=40 ;
class &class_var;
model ksi= &class_var &reduced/selection=backward fast slstay=.001;
run;
ods output close;

%put &class_var &reduced;


data f.paraest;
set paraest;
where ProbChiSq<.05;
run;

proc sql;
select distinct variable into:selected separated by " "
from f.paraest;
quit;

proc sql noprint;
select distinct variable into:char separated by " "
from f.paraest a
where a.variable in (select name from dictionary.columns
where upcase(libname)="F" and
upcase(memname)="TRAIN2" and
upcase(type)="CHAR");
quit;

%put &selected;


%put &char;

proc logistic data=f.train2 desc outest=betas namelen=40 ;
class &char;
model ksi= &selected/selection=backward slstay=.001;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ
What happens if you drop all of the junk between the two LOGISTIC runs and just type in the selected variables from the first LOGISTIC into the CLASS and MODEL statements in the second LOGISTIC?

View solution in original post

10 REPLIES 10
StatDave
SAS Super FREQ

No idea why you would use the DATA step to select parameters with p<.05 after requiring them to have p<.001 in the LOGISTIC run, but ignoring that, if there are any missing values in any of the unselected candidate variables from the first LOGISTIC run, then the set of observations used in the first run is not the same as the set of observations used in the second run and therefore differences are to be expected. In the first run, any observation that has a missing value on any of the specified variables will be omitted.

heloiee
Calcite | Level 5

Thanks for your reply. However, the f.train2 dataset is kind of special---- missing value is represented by -1, so I think it's not because of this problem?

StatDave
SAS Super FREQ
You should never represent missing values with a numeric value because whatever value you use is use in fitting the model. Obviously, the results will change depending on the value you select. If you want to replace missing values, you should use a statistically valid method such as multiple imputation (available in PROC MI).
heloiee
Calcite | Level 5

My bad, I didn't express clearly. Missing value is represented by "-1" in char var. 

StatDave
SAS Super FREQ
I assume you are saying that if a character CLASS variable is missing then you use a -1 so that -1 becomes another valid level of that CLASS variable. OK, but what about missing values in a numeric variable that isn't a CLASS variable? Also note that you didn't use the FAST method in the second LOGISTIC run like in the first.
heloiee
Calcite | Level 5

Yes, your assumption is correct. That's what I mean. For the numeric variables, there are no missing value. And I tried to add FAST, but the result still looks different from the one in the first Proc Logistic output

Reeza
Super User

How different?

Can you replicate the problem using a data set from the SAS sample documentation?

 

If not, it's your data. 

If you can, post the code/log (since data is public) and we can help from there. 

heloiee
Calcite | Level 5

Thanks for your reply. I think I have solved the problem.

StatDave
SAS Super FREQ
What happens if you drop all of the junk between the two LOGISTIC runs and just type in the selected variables from the first LOGISTIC into the CLASS and MODEL statements in the second LOGISTIC?
heloiee
Calcite | Level 5

Thank you. I think I solve the problem by deleting some variables which are constants(I just checked the log. I made some missing indicators and there are no missing value for those var. That's my mistake). 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 879 views
  • 5 likes
  • 3 in conversation