Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

SAS EM predicted probabilities in score data set are all the same

Reply
Learner
Posts: 1

SAS EM predicted probabilities in score data set are all the same

Ive built a model in SAS EM to predict a target(interval) based on different features. I imputed some of them, smoothed some others, and binned, or transformed rest of the features, just some basic data preparation stuff before building the model. Then partition data into training (65) validation(35), then a linear regression was built to predict the target, it has validation error rate of 13%, which is fine.

 

However the problem kicks in when i imported another score data set (actually the it is the same dataset i prepared above, deleting target variable). It has all prediction for target the same value. I cant understand the reasoning before this. My model was fine in the process of building it, and the score data set is just prepared dataset i used to train and validate. what is wrong?

 

and this is my original dataset.

Capture.PNG

This is my model comparision results 

Capture3.PNG

and this is my score 

Capture.PNG

Trusted Advisor
Posts: 1,118

Re: SAS EM predicted probabilities in score data set are all the same

So, you have built a linear regression model. I take it that the dataset shown in your third screenshot contains both the independent variables in your model and the prediction for the target variable LOG_TargetD. (I don't see any "probabilities" in it, though.) The prediction is a linear function of the independent variables ("features"). Hence, if two individuals have the same values for each of the features, their predicted target values are necessarily equal, too.

 

If all individuals have identical values for each of the features, then you should be wondering why this is the case. The equality of predictions would be a mere consequence in this situation. Your third screenshot shows identical values in each column. Is this the general pattern for the whole dataset?

 

Ask a Question
Discussion stats
  • 1 reply
  • 328 views
  • 0 likes
  • 2 in conversation