Statistical Procedures

Demographer · Posted 02-10-2022 04:20 AM

Hi,

I'm a bit noob with scoring using proc logistic.

I want a dataset that includes probabilities for all possible combination of categories of variables used in a logistic model.

I built the dataset to score as follows:

data work.immigdataset;
			do cohort2=0 to 98;
				do sex=0,1;
					do period=0,1;
						do pob_num=0 to 7;
						output;
						end;
					end;
				end;
			end;
run;

The logit model with the score statement is then:

proc logistic data=work.immigrants_from_lfs;
class sex pob_num(ref='0') period /param=ref;
model edunum(descending)= period sex|cohort2 pob_num|cohort2 /unequalslopes;
weight weight / norm;
score data=work.immigdataset out=work.scored_immig;
run;

Edunum has 3 categories and the model is an ordered logit with unequal slopes.

There are no error messages. However, the problem is that the scoring is not done for each second row (see screen shot below). How can I fix this?

I don't see any obvious problem with parameters of the logit model.

Analysis of Maximum Likelihood Estimates
Parameter		edunum	DF	Estimate	Standard Error	Wald Chi-Square	Pr > ChiSq
Intercept		2	1	-1.7993	0.0814	489.0406	<.0001
Intercept		1	1	-0.5025	0.0742	45.9176	<.0001
period	0	2	1	-0.2697	0.0127	451.0235	<.0001
period	0	1	1	-0.2942	0.0139	449.2595	<.0001
sex	0	2	1	1.5305	0.0868	311.0672	<.0001
sex	0	1	1	1.4247	0.0855	277.4533	<.0001
cohort2		2	1	0.0190	0.000992	366.2465	<.0001
cohort2		1	1	0.0204	0.000924	485.8860	<.0001
cohort2*sex	0	2	1	-0.0195	0.00104	349.6493	<.0001
cohort2*sex	0	1	1	-0.0172	0.00103	277.7880	<.0001
pob_num	1	2	1	-1.0595	0.1789	35.0924	<.0001
pob_num	1	1	1	-2.3889	0.1596	224.0873	<.0001
pob_num	2	2	1	1.6704	0.1560	114.6657	<.0001
pob_num	2	1	1	1.1015	0.1442	58.3199	<.0001
pob_num	3	2	1	1.8591	0.1383	180.7171	<.0001
pob_num	3	1	1	0.4209	0.1334	9.9507	0.0016
pob_num	4	2	1	0.7457	0.1458	26.1710	<.0001
pob_num	4	1	1	-0.1724	0.1411	1.4935	0.2217
pob_num	5	2	1	2.0834	0.2466	71.3532	<.0001
pob_num	5	1	1	-0.6860	0.2495	7.5600	0.0060
pob_num	6	2	1	4.3626	0.2696	261.7725	<.0001
pob_num	6	1	1	4.2384	0.4575	85.8354	<.0001
pob_num	7	2	1	2.3235	0.1332	304.4506	<.0001
pob_num	7	1	1	0.6312	0.1351	21.8367	<.0001
cohort2*pob_num	1	2	1	0.00791	0.00213	13.7568	0.0002
cohort2*pob_num	1	1	1	0.0210	0.00193	118.7234	<.0001
cohort2*pob_num	2	2	1	-0.0254	0.00188	182.9621	<.0001
cohort2*pob_num	2	1	1	-0.0217	0.00173	156.8366	<.0001
cohort2*pob_num	3	2	1	-0.0239	0.00167	205.5414	<.0001
cohort2*pob_num	3	1	1	-0.0115	0.00161	51.0017	<.0001
cohort2*pob_num	4	2	1	-0.00618	0.00174	12.5973	0.0004
cohort2*pob_num	4	1	1	-0.00090	0.00170	0.2826	0.5950
cohort2*pob_num	5	2	1	-0.0125	0.00287	18.9461	<.0001
cohort2*pob_num	5	1	1	0.0195	0.00299	42.8260	<.0001
cohort2*pob_num	6	2	1	-0.0338	0.00324	109.0780	<.0001
cohort2*pob_num	6	1	1	-0.0270	0.00549	24.1746	<.0001
cohort2*pob_num	7	2	1	-0.0246	0.00162	230.6967	<.0001
cohort2*pob_num	7	1	1	-0.00275	0.00166	2.7353	0.0982

PaigeMiller · Posted 02-10-2022 06:07 AM

What is L_EDUNUM in your screen capture? It's not in your model statement.

--
Paige Miller

Rick_SAS · Posted 02-10-2022 06:35 AM

@PaigeMiller I_edunum is a manufactured variable that PROC LOGISTIC creates in the output scoring data set. If your response variable is named Y, you get a variable named I_Y.

PaigeMiller · Posted 02-10-2022 07:17 AM

@Rick_SAS wrote:

@PaigeMiller I_edunum is a manufactured variable that PROC LOGISTIC creates in the output scoring data set. If your response variable is named Y, you get a variable named I_Y.

Don't leave me guessing. What is the purpose of l_Y? How is it computed? What is it telling us?

--
Paige Miller

Rick_SAS · Posted 02-10-2022 08:15 AM

Paige, the link in my response takes you to the documentation.

Rick_SAS · Posted 02-10-2022 06:27 AM

What version of SAS are you running? Submit

%put &=SYSVLONG4;

and paste in the result that appears in the log.

I suspect that the problem is your data. run PROC FREQ on your pob_num variable. Do you have 8 categories? I suspect you might have only the even categories 0, 2, 4, and 6. If you have all eight categories of pob_num, then check the WEIGHT variable, pay attention to missing values or zero values. Perhaps odd values of pob_num all have zero or missing weights?

The following program runs your code on simulated data. When I run it, it produces a scoring data set for which all observations are scored. Make sure your version of SAS treats this simulated data correctly. If so, there is something wrong with your data.

data have;
call streaminit(1);
do cohort2=0 to 98;
	do sex=0,1;
		do period=0,1;
			do pob_num=0 to 7;
            edunum = rand("Table", 0.2, 0.5, 0.3) - 1;
            weight = rand("uniform");
			output;
			end;
		end;
	end;
end;
run;

data immigdataset;
do cohort2=0 to 98;
	do sex=0,1;
		do period=0,1;
			do pob_num=0 to 7;
			output;
			end;
		end;
	end;
end;
run;

%put &=SYSVLONG4;

proc logistic data=have;
   class sex pob_num(ref='0') period /param=ref;
   model edunum(descending)= period sex|cohort2 pob_num|cohort2 /unequalslopes;
   weight weight / norm;
   score data=work.immigdataset out=work.scored_immig;
run;

Demographer · Posted 02-10-2022 06:44 AM

I'm using SAS On Demand for Academics.
The variable pob_num has 8 categories and does not have weights with 0. Otherwise, I won't get parameters for the 8 categories in the regression.

I ran the code with simulated data and have scored values for all observations. So I guess there is something wrong with my data but I can't see what is it.

Demographer · Posted 02-11-2022 02:21 AM

If it can help to find the reason of the issue: when I remove the UNEQUALSLOPES statement, it works. However I need it for my model so this is not a good solution.

In the log of the regression, I also have a warning message saying: "Negative individual predicted probabilities were identified in the final model fit. You may want to modify your UNEQUALSLOPES specification."

I did not find any information about this error message in google and I don't understand how it is possible to predict negative probabilities in a logistic model (edunum has 3 categories: 0,1,2).

Statistical Procedures

Scoring with proc logistic

Re: Scoring with proc logistic

Re: Scoring with proc logistic

Re: Scoring with proc logistic

Re: Scoring with proc logistic

Re: Scoring with proc logistic

Re: Scoring with proc logistic

Re: Scoring with proc logistic

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...