BookmarkSubscribeRSS Feed

[SAS 프로그래밍 고수 백승민] [Logistic] 명목형 다항 로지스틱 회귀분석 결과를 활용한 데이터 스코링 처리

Started ‎06-11-2020 by
Modified ‎06-12-2020 by
Views 251

Sample 24993: Score new data using a nominal multinomial logistic model

 

/* The training data set
================================================================*/
data operate;
input hospital trt $ severity $ wt @@;
cards;
1 a none 23 1 a slight 7 1 a moderate 2
1 b none 23 1 b slight 10 1 b moderate 5
1 c none 20 1 c slight 13 1 c moderate 5
1 d none 24 1 d slight 10 1 d moderate 6
2 a none 18 2 a slight 6 2 a moderate 1
2 b none 18 2 b slight 6 2 b moderate 2
2 c none 13 2 c slight 13 2 c moderate 2
2 d none 3 2 d slight 20 2 d moderate 2
3 a none 8 3 a slight 6 3 a moderate 3
3 b none 12 3 b slight 4 3 b moderate 4
3 c none 11 3 c slight 6 3 c moderate 2
3 d none 7 3 d slight 7 3 d moderate 4
4 a none 12 4 a slight 9 4 a moderate 1
4 b none 15 4 b slight 3 4 b moderate 2
4 c none 14 4 c slight 8 4 c moderate 3
4 d none 13 4 d slight 6 4 d moderate 4
;

 

/* ---------------------- CATMOD method ------------------------- */

 


/* Fit the model and output the predicted values for each observed
sample. This must be a generalized logit model (no keyword on the
RESPONSE statement before the slash) and all predictors must be
categorical (no DIRECT statement used).
================================================================*/
proc catmod order=data;
weight wt;
response / out=preds;
model severity=trt hospital;
run;
quit;

 

/* Keep just the predicted values, predictors, and response
================================================================*/
data pred2;
set preds;
if _type_='PROB';
keep severity trt hospital _pred_;
run;

 

/* Find predicted response level (level with highest predicted
probability) in each sample.
================================================================*/
proc summary data=pred2 nway;
class trt hospital;
var _pred_;
output out=predlvl (drop=_type_ _freq_)
maxid(_pred_(severity))=predlvl;
run;

 

/* Transpose the predicted values so that there is one observation per
sample containing predicted values for each response level.
================================================================*/
proc transpose data=pred2 out=pred3 (drop=_name_);
by trt hospital;
id severity;
var _pred_;
run;

 

/* Create a data set for scoring containing various values of the
predictors, including values that were not present in the original
data set for illustration.
================================================================*/
data a;
do n=1 to 100;
hospital=rantbl(239873,.2,.2,.2,.2,.2);
t=rantbl(239873,.2,.2,.2,.2,.2);
if t=1 then trt='a';
else if t=2 then trt='b';
else if t=3 then trt='c';
else if t=4 then trt='d';
else trt='e';
drop t n; output;
end;
run;

 

/* Sort the data set to be scored (A), the data set of predicted
probabilities (PRED3), and the data set of predicted levels (PREDLVL)
by the predictors.
================================================================*/
proc sort data=a; by hospital trt; run;
proc sort data=pred3; by hospital trt; run;
proc sort data=predlvl; by hospital trt; run;

 

/* Merge data set to be scored with data set of predicted values in
the order shown.
================================================================*/
data new;
merge a pred3 predlvl;
by hospital trt;
run;

/* Display the scored data set. Note that observations containing
predictor values not in the original data set have missing
predicted values since the model does not have parameters for these
values.
================================================================*/
proc print data=new;
run;

 


/* --------------------- LOGISTIC method ------------------------ */


/* Beginning in Release 8.2 (TS2M0), the LINK=GLOGIT option in PROC
LOGISTIC allows you to fit the same generalized logit model that PROC
CATMOD fits. Scoring a new data set can be done by simply appending
the data set to the original data, assuring that the response
variable is missing in these observations, and then refitting the
model. The added observations are ignored when estimating the model,
but they are scored by the OUTPUT statement.

 

The following steps create a data set containing the original data
and the data set to be scored, fits the model, and scores the new
observations.
================================================================*/
data b;
set operate a;
run;

 

proc logistic;
class trt hospital;
freq wt;
model severity(order=data) = trt hospital / link=glogit;
output out=out predprobs=(i);
run;

 

proc print;
run;

 


/* Beginning in SAS 9, scoring a new data set can be done using
the SCORE statement. Specify the training data set (OPERATE) in
the DATA= option in the PROC LOGISTIC statement and the data set to
score (A) in the DATA= option of the SCORE statement.

 

The following statements score data set A without the need to
concatenate and score the original data as well.
=========================================================================*/

proc logistic data=operate;
class trt hospital;
freq wt;
model severity(order=data) = trt hospital / link=glogit;
score data=a out=out;
run;

 

proc print;
run;

 


/* Scoring can also be done at a later time using training model information
stored from a previous run. First, fit the model to the training data
and save the model information.
=========================================================================*/
proc logistic data=operate outmodel=model;
class trt hospital;
freq wt;
model severity(order=data) = trt hospital / link=glogit;
run;

 

/* Score the validation data set using saved model information.
=========================================================================*/
proc logistic inmodel=model;
score data=a out=out;
run;

 

proc print;
run;
 

 

 

Score new data using a nominal multinomial logistic model

 

Contents: Purpose / Requirements / Limitations / See Also

 


PURPOSE:

Score new data (a validation data set) using a multinomial logistic model. Methods using PROC CATMOD and PROC LOGISTIC are shown. The CATMOD method only works for generalized logit models. Only observations with combinations of predictor values that occurred in the data set used to fit the model (the training data set) can be scored. The final scored data set has variables containing the predicted probabilities for each observation and a variable containing the predicted response level (the level with maximum predicted probability). The SEE ALSO section below refers to another example using a binary or ordinal multinomial response.

 

REQUIREMENTS:
Base SAS and SAS/STAT Software, Version 6 or later. If the PROC LOGISTIC method is used, Release 8.2 (TS2M0) or 9.0 (TS0M0) is minimally required, depending on the method used.
LIMITATIONS:
CATMOD method:
The model fit with CATMOD must be a generalized logit model so that predicted probabilities for each response level are available in the OUT= data set. A generalized logit model is fit whenever no response function keyword is specified on the RESPONSE statement (or if the RESPONSE statement is omitted).

In the data set to be scored, only observations containing predictor combinations that occurred in the modeled data set can be scored. This is because the model itself is not eval‍uated for each observation being scored. Rather, the predicted values that were output for each observed combination (sample) in the modeled data set are simply applied to matching observations in the data set to be scored. Note this limitation means that if you used the DIRECT statement in CATMOD to treat some predictors as continuous, you will not get scores for observations with new values of these variables. For such observations, you will need to use the fitted model parameters to compute predicted probabilities.

 

LOGISTIC method:
Any model that LOGISTIC can fit can be used to score new data using the method illustrated. Prior to SAS 9.0, specify the model as desired and append the new data, with missing responses, to the original data and use the OUTPUT statement to request predicted values. In SAS 9.0 or later, simply specify the data set to score in the DATA= option in the SCORE statement. No concatenation of data is needed. The SCORE statement can be used to score a separate validation data set either at the same time that the model is fit to the training data set, or at a later time using training model information stored from a previous run. Both are illustrated.

Of course, any observation containing a CLASS variable value that does not appear in the training data cannot be scored since the model does not contain a parameter for that level. However, observations with new continuous predictor values can be scored, unlike the CATMOD method.

 

 

SEE ALSO:
This example illustrates scoring new data using a binary or an ordinal multinomial logistic model:
Version history
Last update:
‎06-12-2020 04:23 AM
Updated by:
Contributors

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Article Labels
Article Tags