- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm a bit noob with scoring using proc logistic.
I want a dataset that includes probabilities for all possible combination of categories of variables used in a logistic model.
I built the dataset to score as follows:
data work.immigdataset;
do cohort2=0 to 98;
do sex=0,1;
do period=0,1;
do pob_num=0 to 7;
output;
end;
end;
end;
end;
run;
The logit model with the score statement is then:
proc logistic data=work.immigrants_from_lfs;
class sex pob_num(ref='0') period /param=ref;
model edunum(descending)= period sex|cohort2 pob_num|cohort2 /unequalslopes;
weight weight / norm;
score data=work.immigdataset out=work.scored_immig;
run;
Edunum has 3 categories and the model is an ordered logit with unequal slopes.
There are no error messages. However, the problem is that the scoring is not done for each second row (see screen shot below). How can I fix this?
I don't see any obvious problem with parameters of the logit model.
Analysis of Maximum Likelihood Estimates | |||||||
---|---|---|---|---|---|---|---|
Parameter | edunum | DF | Estimate | Standard Error |
Wald Chi-Square |
Pr > ChiSq | |
Intercept | 2 | 1 | -1.7993 | 0.0814 | 489.0406 | <.0001 | |
Intercept | 1 | 1 | -0.5025 | 0.0742 | 45.9176 | <.0001 | |
period | 0 | 2 | 1 | -0.2697 | 0.0127 | 451.0235 | <.0001 |
period | 0 | 1 | 1 | -0.2942 | 0.0139 | 449.2595 | <.0001 |
sex | 0 | 2 | 1 | 1.5305 | 0.0868 | 311.0672 | <.0001 |
sex | 0 | 1 | 1 | 1.4247 | 0.0855 | 277.4533 | <.0001 |
cohort2 | 2 | 1 | 0.0190 | 0.000992 | 366.2465 | <.0001 | |
cohort2 | 1 | 1 | 0.0204 | 0.000924 | 485.8860 | <.0001 | |
cohort2*sex | 0 | 2 | 1 | -0.0195 | 0.00104 | 349.6493 | <.0001 |
cohort2*sex | 0 | 1 | 1 | -0.0172 | 0.00103 | 277.7880 | <.0001 |
pob_num | 1 | 2 | 1 | -1.0595 | 0.1789 | 35.0924 | <.0001 |
pob_num | 1 | 1 | 1 | -2.3889 | 0.1596 | 224.0873 | <.0001 |
pob_num | 2 | 2 | 1 | 1.6704 | 0.1560 | 114.6657 | <.0001 |
pob_num | 2 | 1 | 1 | 1.1015 | 0.1442 | 58.3199 | <.0001 |
pob_num | 3 | 2 | 1 | 1.8591 | 0.1383 | 180.7171 | <.0001 |
pob_num | 3 | 1 | 1 | 0.4209 | 0.1334 | 9.9507 | 0.0016 |
pob_num | 4 | 2 | 1 | 0.7457 | 0.1458 | 26.1710 | <.0001 |
pob_num | 4 | 1 | 1 | -0.1724 | 0.1411 | 1.4935 | 0.2217 |
pob_num | 5 | 2 | 1 | 2.0834 | 0.2466 | 71.3532 | <.0001 |
pob_num | 5 | 1 | 1 | -0.6860 | 0.2495 | 7.5600 | 0.0060 |
pob_num | 6 | 2 | 1 | 4.3626 | 0.2696 | 261.7725 | <.0001 |
pob_num | 6 | 1 | 1 | 4.2384 | 0.4575 | 85.8354 | <.0001 |
pob_num | 7 | 2 | 1 | 2.3235 | 0.1332 | 304.4506 | <.0001 |
pob_num | 7 | 1 | 1 | 0.6312 | 0.1351 | 21.8367 | <.0001 |
cohort2*pob_num | 1 | 2 | 1 | 0.00791 | 0.00213 | 13.7568 | 0.0002 |
cohort2*pob_num | 1 | 1 | 1 | 0.0210 | 0.00193 | 118.7234 | <.0001 |
cohort2*pob_num | 2 | 2 | 1 | -0.0254 | 0.00188 | 182.9621 | <.0001 |
cohort2*pob_num | 2 | 1 | 1 | -0.0217 | 0.00173 | 156.8366 | <.0001 |
cohort2*pob_num | 3 | 2 | 1 | -0.0239 | 0.00167 | 205.5414 | <.0001 |
cohort2*pob_num | 3 | 1 | 1 | -0.0115 | 0.00161 | 51.0017 | <.0001 |
cohort2*pob_num | 4 | 2 | 1 | -0.00618 | 0.00174 | 12.5973 | 0.0004 |
cohort2*pob_num | 4 | 1 | 1 | -0.00090 | 0.00170 | 0.2826 | 0.5950 |
cohort2*pob_num | 5 | 2 | 1 | -0.0125 | 0.00287 | 18.9461 | <.0001 |
cohort2*pob_num | 5 | 1 | 1 | 0.0195 | 0.00299 | 42.8260 | <.0001 |
cohort2*pob_num | 6 | 2 | 1 | -0.0338 | 0.00324 | 109.0780 | <.0001 |
cohort2*pob_num | 6 | 1 | 1 | -0.0270 | 0.00549 | 24.1746 | <.0001 |
cohort2*pob_num | 7 | 2 | 1 | -0.0246 | 0.00162 | 230.6967 | <.0001 |
cohort2*pob_num | 7 | 1 | 1 | -0.00275 | 0.00166 | 2.7353 | 0.0982 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What is L_EDUNUM in your screen capture? It's not in your model statement.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@PaigeMiller I_edunum is a manufactured variable that PROC LOGISTIC creates in the output scoring data set. If your response variable is named Y, you get a variable named I_Y.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Rick_SAS wrote:
@PaigeMiller I_edunum is a manufactured variable that PROC LOGISTIC creates in the output scoring data set. If your response variable is named Y, you get a variable named I_Y.
Don't leave me guessing. What is the purpose of l_Y? How is it computed? What is it telling us?
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Paige, the link in my response takes you to the documentation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What version of SAS are you running? Submit
%put &=SYSVLONG4;
and paste in the result that appears in the log.
I suspect that the problem is your data. run PROC FREQ on your pob_num variable. Do you have 8 categories? I suspect you might have only the even categories 0, 2, 4, and 6. If you have all eight categories of pob_num, then check the WEIGHT variable, pay attention to missing values or zero values. Perhaps odd values of pob_num all have zero or missing weights?
The following program runs your code on simulated data. When I run it, it produces a scoring data set for which all observations are scored. Make sure your version of SAS treats this simulated data correctly. If so, there is something wrong with your data.
data have;
call streaminit(1);
do cohort2=0 to 98;
do sex=0,1;
do period=0,1;
do pob_num=0 to 7;
edunum = rand("Table", 0.2, 0.5, 0.3) - 1;
weight = rand("uniform");
output;
end;
end;
end;
end;
run;
data immigdataset;
do cohort2=0 to 98;
do sex=0,1;
do period=0,1;
do pob_num=0 to 7;
output;
end;
end;
end;
end;
run;
%put &=SYSVLONG4;
proc logistic data=have;
class sex pob_num(ref='0') period /param=ref;
model edunum(descending)= period sex|cohort2 pob_num|cohort2 /unequalslopes;
weight weight / norm;
score data=work.immigdataset out=work.scored_immig;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The variable pob_num has 8 categories and does not have weights with 0. Otherwise, I won't get parameters for the 8 categories in the regression.
I ran the code with simulated data and have scored values for all observations. So I guess there is something wrong with my data but I can't see what is it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If it can help to find the reason of the issue: when I remove the UNEQUALSLOPES statement, it works. However I need it for my model so this is not a good solution.
In the log of the regression, I also have a warning message saying: "Negative individual predicted probabilities were identified in the final model fit. You may want to modify your UNEQUALSLOPES specification."
I did not find any information about this error message in google and I don't understand how it is possible to predict negative probabilities in a logistic model (edunum has 3 categories: 0,1,2).