BookmarkSubscribeRSS Feed
etiennel
Calcite | Level 5

Greetings,

 

I'm running a Proc logistic on some data.

 

When I score the training data, less than 1% of predicted values have P_0 = P_1 = 0, but I'm score my testing data, more than 60% of the data have P_0 = P_1 = 0, ergo no prediction.

 

Since it's a linear equation, shouldn't P_1 always have a value and shouldn't P_0 simply equal 1-P_1? Am I missing something? How can I find what is missing from the data to get a prediction on the testing data?

 

Thank you for your help.

5 REPLIES 5
ballardw
Super User

You should provide some example data and the code you are running. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to create datastep code you can paste as text in the forum or attach as a TEXT file to allow testing.

Reeza
Super User

That doesn't appear correct. Check you log for any issues and if none post your code, log and data as suggested. 

etiennel
Calcite | Level 5

I have solved the problem by recreating the linear equation and getting the probabilities "by hand". It gives me the same results exactly for those that had results, and it gives me results for all the others as well.

 

Still, it seems like a weird bug.

Reeza
Super User

That's not a viable workaround. 

 

Was there something specific about those records? 

If you can replicate the issue contact SAS Tech Supprt with the problem or post it here. 

Rick_SAS
SAS Super FREQ

You need to provide details, preferably the SAS code.

 

You say "shouldn't P_1 always have a value and shouldn't P_0 simply equal 1-P_1?"

In general, the answer is "yes."  Are you using the SCORE statement to get this output data set? Or PROC PLM?

 

Are you sure that you are fitting a binary response? Do a PROC FREQ on the response variable to check whether you have a third value and are doing a multinomial regression.

 

One possibility is that your scoring data is contains values that are far outside the range of the training data.

Another possibility is that the scoring data is contains levels of a classification variable that are not contained in the training data.

Check the SAS log to see if there is a NOTE similar to the following:

 

NOTE: Some observations in the WORK.SCORE data set are not scored because they have class levels that are missing or

      are not present in the analysis data set.

 

Until you provide details, all we can do is guess, which isn't very productive.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2171 views
  • 1 like
  • 4 in conversation