turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- SAS EM predicted probabilities in score data set a...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-08-2015 11:47 AM

Ive built a model in SAS EM to predict a target(interval) based on different features. I imputed some of them, smoothed some others, and binned, or transformed rest of the features, just some basic data preparation stuff before building the model. Then partition data into training (65) validation(35), then a linear regression was built to predict the target, it has validation error rate of 13%, which is fine.

However the problem kicks in when i imported another score data set (actually the it is the same dataset i prepared above, deleting target variable). It has all prediction for target the same value. I cant understand the reasoning before this. My model was fine in the process of building it, and the score data set is just prepared dataset i used to train and validate. what is wrong?

and this is my original dataset.

This is my model comparision results

and this is my score

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Xiaojun

12-08-2015 02:39 PM

So, you have built a linear regression model. I take it that the dataset shown in your third screenshot contains both the independent variables in your model and the prediction for the target variable LOG_TargetD. (I don't see any "probabilities" in it, though.) The prediction is a linear function of the independent variables ("features"). Hence, if two individuals have the same values for each of the features, their predicted target values are necessarily equal, too.

If all individuals have identical values for each of the *features*, then you should be wondering why *this* is the case. The equality of predictions would be a mere consequence in this situation. Your third screenshot shows identical values in each column. Is this the general pattern for the whole dataset?