12-08-2015 11:47 AM
Ive built a model in SAS EM to predict a target(interval) based on different features. I imputed some of them, smoothed some others, and binned, or transformed rest of the features, just some basic data preparation stuff before building the model. Then partition data into training (65) validation(35), then a linear regression was built to predict the target, it has validation error rate of 13%, which is fine.
However the problem kicks in when i imported another score data set (actually the it is the same dataset i prepared above, deleting target variable). It has all prediction for target the same value. I cant understand the reasoning before this. My model was fine in the process of building it, and the score data set is just prepared dataset i used to train and validate. what is wrong?
and this is my original dataset.
This is my model comparision results
and this is my score
12-08-2015 02:39 PM
So, you have built a linear regression model. I take it that the dataset shown in your third screenshot contains both the independent variables in your model and the prediction for the target variable LOG_TargetD. (I don't see any "probabilities" in it, though.) The prediction is a linear function of the independent variables ("features"). Hence, if two individuals have the same values for each of the features, their predicted target values are necessarily equal, too.
If all individuals have identical values for each of the features, then you should be wondering why this is the case. The equality of predictions would be a mere consequence in this situation. Your third screenshot shows identical values in each column. Is this the general pattern for the whole dataset?