10-04-2016 12:36 PM
I'm running a Proc logistic on some data.
When I score the training data, less than 1% of predicted values have P_0 = P_1 = 0, but I'm score my testing data, more than 60% of the data have P_0 = P_1 = 0, ergo no prediction.
Since it's a linear equation, shouldn't P_1 always have a value and shouldn't P_0 simply equal 1-P_1? Am I missing something? How can I find what is missing from the data to get a prediction on the testing data?
Thank you for your help.
10-04-2016 12:49 PM
You should provide some example data and the code you are running. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to create datastep code you can paste as text in the forum or attach as a TEXT file to allow testing.
10-06-2016 08:42 AM
I have solved the problem by recreating the linear equation and getting the probabilities "by hand". It gives me the same results exactly for those that had results, and it gives me results for all the others as well.
Still, it seems like a weird bug.
10-06-2016 08:52 AM
That's not a viable workaround.
Was there something specific about those records?
If you can replicate the issue contact SAS Tech Supprt with the problem or post it here.
10-06-2016 09:12 AM
You need to provide details, preferably the SAS code.
You say "shouldn't P_1 always have a value and shouldn't P_0 simply equal 1-P_1?"
In general, the answer is "yes." Are you using the SCORE statement to get this output data set? Or PROC PLM?
Are you sure that you are fitting a binary response? Do a PROC FREQ on the response variable to check whether you have a third value and are doing a multinomial regression.
One possibility is that your scoring data is contains values that are far outside the range of the training data.
Another possibility is that the scoring data is contains levels of a classification variable that are not contained in the training data.
Check the SAS log to see if there is a NOTE similar to the following:
NOTE: Some observations in the WORK.SCORE data set are not scored because they have class levels that are missing or
are not present in the analysis data set.
Until you provide details, all we can do is guess, which isn't very productive.