BookmarkSubscribeRSS Feed
Demographer
Pyrite | Level 9

Hi,

I'm a bit noob with scoring using proc logistic.

I want a dataset that includes probabilities for all possible combination of categories of variables used in a logistic model. 

 

I built the dataset to score as follows:

 

data work.immigdataset;
			do cohort2=0 to 98;
				do sex=0,1;
					do period=0,1;
						do pob_num=0 to 7;
						output;
						end;
					end;
				end;
			end;
run;

The logit model with the score statement is then:

proc logistic data=work.immigrants_from_lfs;
class sex pob_num(ref='0') period /param=ref;
model edunum(descending)= period sex|cohort2 pob_num|cohort2 /unequalslopes;
weight weight / norm;
score data=work.immigdataset out=work.scored_immig;
run;

Edunum has 3 categories and the model is an ordered logit with unequal slopes. 

 

There are no error messages. However, the problem is that the scoring is not done for each second row (see screen shot below). How can I fix this?

Demographer_0-1644509686093.png

 

I don't see any obvious problem with parameters of the logit model.

 

Analysis of Maximum Likelihood Estimates
Parameter   edunum DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq
Intercept   2 1 -1.7993 0.0814 489.0406 <.0001
Intercept   1 1 -0.5025 0.0742 45.9176 <.0001
period 0 2 1 -0.2697 0.0127 451.0235 <.0001
period 0 1 1 -0.2942 0.0139 449.2595 <.0001
sex 0 2 1 1.5305 0.0868 311.0672 <.0001
sex 0 1 1 1.4247 0.0855 277.4533 <.0001
cohort2   2 1 0.0190 0.000992 366.2465 <.0001
cohort2   1 1 0.0204 0.000924 485.8860 <.0001
cohort2*sex 0 2 1 -0.0195 0.00104 349.6493 <.0001
cohort2*sex 0 1 1 -0.0172 0.00103 277.7880 <.0001
pob_num 1 2 1 -1.0595 0.1789 35.0924 <.0001
pob_num 1 1 1 -2.3889 0.1596 224.0873 <.0001
pob_num 2 2 1 1.6704 0.1560 114.6657 <.0001
pob_num 2 1 1 1.1015 0.1442 58.3199 <.0001
pob_num 3 2 1 1.8591 0.1383 180.7171 <.0001
pob_num 3 1 1 0.4209 0.1334 9.9507 0.0016
pob_num 4 2 1 0.7457 0.1458 26.1710 <.0001
pob_num 4 1 1 -0.1724 0.1411 1.4935 0.2217
pob_num 5 2 1 2.0834 0.2466 71.3532 <.0001
pob_num 5 1 1 -0.6860 0.2495 7.5600 0.0060
pob_num 6 2 1 4.3626 0.2696 261.7725 <.0001
pob_num 6 1 1 4.2384 0.4575 85.8354 <.0001
pob_num 7 2 1 2.3235 0.1332 304.4506 <.0001
pob_num 7 1 1 0.6312 0.1351 21.8367 <.0001
cohort2*pob_num 1 2 1 0.00791 0.00213 13.7568 0.0002
cohort2*pob_num 1 1 1 0.0210 0.00193 118.7234 <.0001
cohort2*pob_num 2 2 1 -0.0254 0.00188 182.9621 <.0001
cohort2*pob_num 2 1 1 -0.0217 0.00173 156.8366 <.0001
cohort2*pob_num 3 2 1 -0.0239 0.00167 205.5414 <.0001
cohort2*pob_num 3 1 1 -0.0115 0.00161 51.0017 <.0001
cohort2*pob_num 4 2 1 -0.00618 0.00174 12.5973 0.0004
cohort2*pob_num 4 1 1 -0.00090 0.00170 0.2826 0.5950
cohort2*pob_num 5 2 1 -0.0125 0.00287 18.9461 <.0001
cohort2*pob_num 5 1 1 0.0195 0.00299 42.8260 <.0001
cohort2*pob_num 6 2 1 -0.0338 0.00324 109.0780 <.0001
cohort2*pob_num 6 1 1 -0.0270 0.00549 24.1746 <.0001
cohort2*pob_num 7 2 1 -0.0246 0.00162 230.6967 <.0001
cohort2*pob_num 7 1 1 -0.00275 0.00166 2.7353 0.0982

 

7 REPLIES 7
PaigeMiller
Diamond | Level 26

What is L_EDUNUM in your screen capture? It's not in your model statement.

--
Paige Miller
Rick_SAS
SAS Super FREQ

@PaigeMiller I_edunum is a manufactured variable that PROC LOGISTIC creates in the output scoring data set.  If your response variable is named Y, you get a variable named I_Y.

PaigeMiller
Diamond | Level 26

@Rick_SAS wrote:

@PaigeMiller I_edunum is a manufactured variable that PROC LOGISTIC creates in the output scoring data set.  If your response variable is named Y, you get a variable named I_Y.


Don't leave me guessing. What is the purpose of l_Y? How is it computed? What is it telling us?

--
Paige Miller
Rick_SAS
SAS Super FREQ

Paige, the link in my response takes you to the documentation.

Rick_SAS
SAS Super FREQ

What version of SAS are you running? Submit

%put &=SYSVLONG4;

and paste in the result that appears in the log.

 

 I suspect that the problem is your data. run PROC FREQ on your pob_num variable. Do you have 8 categories? I suspect you might have only the even categories 0, 2, 4, and 6. If you have all eight categories of pob_num, then check the WEIGHT variable, pay attention to missing values or zero values. Perhaps odd values of pob_num all have zero or missing weights?

 

The following program runs your code on simulated data. When I run it, it produces a scoring data set for which all observations are scored.  Make sure your version of SAS treats this simulated data correctly.  If so, there is something wrong with your data.

data have;
call streaminit(1);
do cohort2=0 to 98;
	do sex=0,1;
		do period=0,1;
			do pob_num=0 to 7;
            edunum = rand("Table", 0.2, 0.5, 0.3) - 1;
            weight = rand("uniform");
			output;
			end;
		end;
	end;
end;
run;

data immigdataset;
do cohort2=0 to 98;
	do sex=0,1;
		do period=0,1;
			do pob_num=0 to 7;
			output;
			end;
		end;
	end;
end;
run;

%put &=SYSVLONG4;

proc logistic data=have;
   class sex pob_num(ref='0') period /param=ref;
   model edunum(descending)= period sex|cohort2 pob_num|cohort2 /unequalslopes;
   weight weight / norm;
   score data=work.immigdataset out=work.scored_immig;
run;

 

Demographer
Pyrite | Level 9
I'm using SAS On Demand for Academics.
The variable pob_num has 8 categories and does not have weights with 0. Otherwise, I won't get parameters for the 8 categories in the regression.

I ran the code with simulated data and have scored values for all observations. So I guess there is something wrong with my data but I can't see what is it.
Demographer
Pyrite | Level 9

If it can help to find the reason of the issue: when I remove the UNEQUALSLOPES statement, it works. However I need it for my model so this is not a good solution.

 

In the log of the regression, I also have a warning message saying: "Negative individual predicted probabilities were identified in the final model fit. You may want to modify your UNEQUALSLOPES specification."

I did not find any information about this error message in google and I don't understand how it is possible to predict negative probabilities in a logistic model (edunum has 3 categories: 0,1,2).

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1081 views
  • 1 like
  • 3 in conversation