Hello
Lets say I want to develop a logistic regression model and then apply it to score a new data set.
raw_tbl data set is the data set on which I build the regression model
test_tbl data set is the data set that I want to score using the logistic model.
I want to ask about WAY2- what is the way to score the new data using WAY2??
When I use statement ods output parameterestimates=ttt;
then what happen?
How can I score the new data using Way2?
/* people are asked whether or not they would subscribe to a new newspaper*/
/*For each person variables sex (Female, Male), age, and subs (1=yes, 0=no) are recorded.*/
data raw_tbl;
input sex $ age subs @@;
cards;
Female 35 0 Male 44 0
Male 45 1 Female 47 1
Female 51 0 Female 47 0
Male 54 1 Male 47 1
Female 35 0 Female 34 0
Female 48 0 Female 56 1
Male 46 1 Female 59 1
Female 46 1 Male 59 1
Male 38 1 Female 39 0
Male 49 1 Male 42 1
Male 50 1 Female 45 0
Female 47 0 Female 30 1
Female 39 0 Female 51 0
Female 45 0 Female 43 1
Male 39 1 Male 31 0
Female 39 0 Male 34 0
Female 52 1 Female 46 0
Male 58 1 Female 50 1
Female 32 0 Female 52 1
Female 35 0 Female 51 0
;
Run;
data test_tbl;
input sex $ age;
cards;
Female 35
Male 19
Male 70
;
run;
proc format;
value Dependent_Fmt
1 = 'accept'
0 = 'reject';
run;
/*********WAY1***/
/*********WAY1***/
/*********WAY1***/
proc probit data=raw_tbl;
class sex;
model subs(event="accept")=sex age / d=logistic itprint;
format subs Dependent_Fmt.;
store out=LogitModel;
run;
/* use the SCORE statement in the PLM procedure to score new observations based on fitted model saved by the STORE statement*/
proc plm restore=LogitModel;
score data=test_tbl out=test_out_tbl predicted / ilink;
run;
/*********WAY2***/
/*********WAY2***/
/*********WAY2***/
ods output parameterestimates=ttt;
/*ods html file="/usr/local/SAS/SASUsers/LabRet/UserDir/udclk79/ppppp.html" style=minimal ;*/
proc genmod data=raw_tbl namelen=60 descending ;
class sex age;
model subs=sex age/ dist=binomial link=logit type3 wald ;
output out=xxx
p=P_subscibe xbeta=logit;
ODS SELECT ModelANOVA;
run;
ODS OUTPUT CLOSE;
/**Question---How can I score new data set??****/
data raw_tbl;
input sex $ age subs @@;
cards;
Female 35 0 Male 44 0
Male 45 1 Female 47 1
Female 51 0 Female 47 0
Male 54 1 Male 47 1
Female 35 0 Female 34 0
Female 48 0 Female 56 1
Male 46 1 Female 59 1
Female 46 1 Male 59 1
Male 38 1 Female 39 0
Male 49 1 Male 42 1
Male 50 1 Female 45 0
Female 47 0 Female 30 1
Female 39 0 Female 51 0
Female 45 0 Female 43 1
Male 39 1 Male 31 0
Female 39 0 Male 34 0
Female 52 1 Female 46 0
Male 58 1 Female 50 1
Female 32 0 Female 52 1
Female 35 0 Female 51 0
;
Run;
data test_tbl;
input sex $ age;
cards;
Female 35
Male 19
Male 70
;
run;
proc genmod data=raw_tbl namelen=60 descending ;
class sex ;
model subs=sex age/ dist=binomial link=logit type3 wald ;
output out=xxx p=P_subscibe xbeta=logit;
store out=RonLogitModel2;
run;
proc plm restore=RonLogitModel2;
score data=test_tbl out=test_out_tbl predicted / ilink;
run;
Same as you did in PROC PROBIT.
In WAY1-
I used in proc probit : store out=LogitModel
and then in PROC PLM used restore=LogitModel
Question: Can I write any name in store out? For example: store out=MyModel1 ??
In WAY2-
I used in proc genmod-
ODS output parameters=ttt
and then as you said I need to use in proc PLM by using -----restore=ttt ???
Question- If I want to see the coefficients value that were created in ttt? How can I see it?
@Ronein wrote:
In WAY1-
I used in proc probit : store out=LogitModel
and then in PROC PLM used restore=LogitModel
Question: Can I write any name in store out? For example: store out=MyModel1 ??
Why don't you try it and learn by yourself, instead of asking us? Trying it yourself also has a benefit that you will get a faster answer from SAS than you will get from the SAS community, and unlike asking us, SAS will always give the 100% correct answer, while we get it wrong from time to time.
Your model is an equation used to predict the response. This equation is calculated here inside the probit step. You can see the resulting parameters of the equation in table ttt, but you do not really have to see them to score new data. The computer will score new data by applying this equation to new data to predict the new response. That, like @PaigeMiller said, it done using PROC plm.
I have tried but get error
ERROR: The file WORK.MYLOGITMODEL1 does not exist or it is not a valid item store.
data raw_tbl;
input sex $ age subs @@;
cards;
Female 35 0 Male 44 0
Male 45 1 Female 47 1
Female 51 0 Female 47 0
Male 54 1 Male 47 1
Female 35 0 Female 34 0
Female 48 0 Female 56 1
Male 46 1 Female 59 1
Female 46 1 Male 59 1
Male 38 1 Female 39 0
Male 49 1 Male 42 1
Male 50 1 Female 45 0
Female 47 0 Female 30 1
Female 39 0 Female 51 0
Female 45 0 Female 43 1
Male 39 1 Male 31 0
Female 39 0 Male 34 0
Female 52 1 Female 46 0
Male 58 1 Female 50 1
Female 32 0 Female 52 1
Female 35 0 Female 51 0
;
Run;
data test_tbl;
input sex $ age;
cards;
Female 35
Male 19
Male 70
;
run;
ods output parameterestimates=MyLogitModel1;
proc genmod data=raw_tbl namelen=60 descending ;
class sex age;
model subs=sex age/ dist=binomial link=logit type3 wald ;
output out=xxx
p=P_subscibe xbeta=logit;
ODS SELECT ModelANOVA;
run;
ODS OUTPUT CLOSE;
/**score new data set??****/
proc plm restore=MyLogitModel1;
score data=test_tbl out=test_out_tbl predicted / ilink;
run;
You can not have AGE in CLASS variable, since it is continuous variable ,not category variable.
And you want to score AGE which value is not in RAW_TBL dataset.
data raw_tbl; input sex $ age subs @@; cards; Female 35 0 Male 44 0 Male 45 1 Female 47 1 Female 51 0 Female 47 0 Male 54 1 Male 47 1 Female 35 0 Female 34 0 Female 48 0 Female 56 1 Male 46 1 Female 59 1 Female 46 1 Male 59 1 Male 38 1 Female 39 0 Male 49 1 Male 42 1 Male 50 1 Female 45 0 Female 47 0 Female 30 1 Female 39 0 Female 51 0 Female 45 0 Female 43 1 Male 39 1 Male 31 0 Female 39 0 Male 34 0 Female 52 1 Female 46 0 Male 58 1 Female 50 1 Female 32 0 Female 52 1 Female 35 0 Female 51 0 ; Run; data test_tbl; input sex $ age; cards; Female 35 Male 19 Male 70 ; run; data have; set raw_tbl test_tbl(in=inb); pred=inb; run; proc genmod data=have namelen=60 descending ; class sex ; model subs=sex age/ dist=binomial link=logit type3 wald ; output out=xxx p=P_subscibe xbeta=logit; run; proc print data=xxx noobs; where pred=1; run;
I wan to score subs (not age)
The dependent var is sub (not age)
The model is model subs=sex age
As I understand in class we put only categorical predictor variables?(independent variables)
The method you showed id perfect (I added it as Way3)
As I see in your method must create a new data set that contain the observations of developing the model plus new observations that want to score?
So as I see always need to add the observations that created the model?
This method is not so easy because lets say that 10 years later I want to score new data so then I need to add again the observations from developing the model?
IT is not so practical
It is better to work on the regression formula and not need each time to work on observations from developing model
Can you show please how to fix way2?? I removed age var from class since it is not categorical but still error.
ERROR: The file WORK.RONLOGITMODEL2 does not exist or it is not a valid item store.
/* people are asked whether or not they would subscribe to a new newspaper*/
/*For each person variables sex (Female, Male), age, and subs (1=yes, 0=no) are recorded*/
/**Dependent binary var:subs**/
/**Predictor1-sex (categorical)**/
/**Predictor2-age (Discrete)**/
data raw_tbl;
input sex $ age subs @@;
cards;
Female 35 0 Male 44 0
Male 45 1 Female 47 1
Female 51 0 Female 47 0
Male 54 1 Male 47 1
Female 35 0 Female 34 0
Female 48 0 Female 56 1
Male 46 1 Female 59 1
Female 46 1 Male 59 1
Male 38 1 Female 39 0
Male 49 1 Male 42 1
Male 50 1 Female 45 0
Female 47 0 Female 30 1
Female 39 0 Female 51 0
Female 45 0 Female 43 1
Male 39 1 Male 31 0
Female 39 0 Male 34 0
Female 52 1 Female 46 0
Male 58 1 Female 50 1
Female 32 0 Female 52 1
Female 35 0 Female 51 0
;
Run;
data test_tbl;
input sex $ age;
cards;
Female 35
Male 19
Male 70
;
run;
proc format;
value Dependent_Fmt
1 = 'accept'
0 = 'reject';
run;
/*****WAY1--Work good****/
/*****WAY1--Work good****/
/*****WAY1--Work good****/
proc probit data=raw_tbl;
class sex;
model subs(event="accept")=sex age / d=logistic itprint;
format subs Dependent_Fmt.;
store out=RonLogitModel1;
run;
/* use SCORE statement in PLM procedure to score new observations based on fitted model saved by the STORE statement*/
proc plm restore=RonLogitModel1;
score data=test_tbl out=test_out_tbl predicted / ilink;
run;
/*****WAY2--problem****/
/*****WAY2--problem****/
/*****WAY2--problem****/
ods output parameterestimates=RonLogitModel2;
proc genmod data=raw_tbl namelen=60 descending ;
class sex;
model subs=sex age/ dist=binomial link=logit type3 wald ;
output out=xxx
p=P_subscibe xbeta=logit;
ODS SELECT ModelANOVA;
run;
ODS OUTPUT CLOSE;
/**score new data set??****/
proc plm restore=RonLogitModel2;
score data=test_tbl out=test_out_tbl predicted / ilink;
run;
/*****WAY3--Work good****/
/*****WAY3--Work good****/
/*****WAY3--Work good****/
/*****WAY3--Work good****/
data have;
set raw_tbl test_tbl(in=inb);
pred=inb;/**Binary indicator that mark obs that we want to score**/
run;
proc genmod data=have namelen=60 descending ;
class sex ;
model subs=sex age/ dist=binomial link=logit type3 wald ;
output out=WANT p=P_subscibe xbeta=logit;
/**Predict Var is called P_subscibe**/
/**Output data set with row data and predict var is called WANT**/
run;
proc print data=WANT noobs;
where pred=1;
run;
data raw_tbl;
input sex $ age subs @@;
cards;
Female 35 0 Male 44 0
Male 45 1 Female 47 1
Female 51 0 Female 47 0
Male 54 1 Male 47 1
Female 35 0 Female 34 0
Female 48 0 Female 56 1
Male 46 1 Female 59 1
Female 46 1 Male 59 1
Male 38 1 Female 39 0
Male 49 1 Male 42 1
Male 50 1 Female 45 0
Female 47 0 Female 30 1
Female 39 0 Female 51 0
Female 45 0 Female 43 1
Male 39 1 Male 31 0
Female 39 0 Male 34 0
Female 52 1 Female 46 0
Male 58 1 Female 50 1
Female 32 0 Female 52 1
Female 35 0 Female 51 0
;
Run;
data test_tbl;
input sex $ age;
cards;
Female 35
Male 19
Male 70
;
run;
proc genmod data=raw_tbl namelen=60 descending ;
class sex ;
model subs=sex age/ dist=binomial link=logit type3 wald ;
output out=xxx p=P_subscibe xbeta=logit;
store out=RonLogitModel2;
run;
proc plm restore=RonLogitModel2;
score data=test_tbl out=test_out_tbl predicted / ilink;
run;
That's great and thank you.
Regarding the statement :
ods output parameterestimates=RonLogitModel2;
Can you please show how to use it in order to calculate score for new data?
(Please see WAY2 that is not working)
Okay now I think I understand
ods output parameterestimates=Coef_tbl;
is just to save the coefficients in a real data set and be able to see them.
This code is working 100%
/*****WAY2--Work good****/
ods output parameterestimates=Coef_tbl;
proc genmod data=raw_tbl namelen=60 descending ;
class sex;
model subs=sex age/ dist=binomial link=logit type3 wald ;
store MyRonModel;
output out=row_data_with_predict p=P_subscibe xbeta=logit;
ODS SELECT ModelANOVA;
run;
/**Create data set row_data_with_predict that have row data with predict colmn called :P_subscibe***/
/***Show the coefficients in print screen----to verify that store statement stored the coef***/
proc plm source=MyRonModel;
show parameters;
run;
/**score new data set****/
proc plm restore=MyRonModel;
score data=test_tbl out=test_out_tbl predicted / ilink;
run;
ere is the summary of 4 ways to score new data based on model formula
(Logistic regression)
/* people are asked whether or not they would subscribe to a new newspaper*/
/*For each person variables sex (Female, Male), age, and subs (1=yes, 0=no) are recorded*/
/**Dependent binary var:subs**/
/**Predictor1-sex (categorical)**/
/**Predictor2-age (Discrete)**/
data raw_tbl;
input sex $ age subs @@;
cards;
Female 35 0 Male 44 0
Male 45 1 Female 47 1
Female 51 0 Female 47 0
Male 54 1 Male 47 1
Female 35 0 Female 34 0
Female 48 0 Female 56 1
Male 46 1 Female 59 1
Female 46 1 Male 59 1
Male 38 1 Female 39 0
Male 49 1 Male 42 1
Male 50 1 Female 45 0
Female 47 0 Female 30 1
Female 39 0 Female 51 0
Female 45 0 Female 43 1
Male 39 1 Male 31 0
Female 39 0 Male 34 0
Female 52 1 Female 46 0
Male 58 1 Female 50 1
Female 32 0 Female 52 1
Female 35 0 Female 51 0
;
Run;
data test_tbl;
input sex $ age;
cards;
Female 35
Male 19
Male 70
;
run;
proc format;
value Dependent_Fmt
1 = 'accept'
0 = 'reject';
run;
/*****WAY1--Work good****/
/*****WAY1--Work good****/
/*****WAY1--Work good****/
proc probit data=raw_tbl;
class sex;
model subs(event="accept")=sex age / d=logistic itprint;
format subs Dependent_Fmt.;
store out=RonLogitModel1;
run;
/* use SCORE statement in PLM procedure to score new observations based on fitted model saved by the STORE statement*/
proc plm restore=RonLogitModel1;
score data=test_tbl out=test_out_tbl predicted / ilink;
run;
/*****WAY2--Work good****/
/*****WAY2--Work good****/
/*****WAY2--Work good****/
ods output parameterestimates=Coef_tbl;
proc genmod data=raw_tbl namelen=60 descending ;
class sex;
model subs=sex age/ dist=binomial link=logit type3 wald ;
store MyRonModel;
output out=row_data_with_predict p=P_subscibe xbeta=logit;
ODS SELECT ModelANOVA;
run;
/**Create data set row_data_with_predict that have row data with predict colmn called :P_subscibe***/
/***Show the coefficents in a data set***/
proc plm source=MyRonModel;
show parameters;
run;
/**score new data set****/
proc plm restore=MyRonModel;
score data=test_tbl out=test_out_tbl predicted / ilink;
run;
/*****WAY3--Work good****/
/*****WAY3--Work good****/
/*****WAY3--Work good****/
proc genmod data=raw_tbl namelen=60 descending ;
class sex ;
model subs=sex age/ dist=binomial link=logit type3 wald ;
output out=xxx p=P_subscibe xbeta=logit;
store out=RonLogitModel2;
run;
proc plm restore=RonLogitModel2;
score data=test_tbl out=test_out_tbl predicted / ilink;
run;
/**Predict Var is called predicted**/
/**Output data set with row data and predict var is called test_out_tbl**/
/*****WAY4-Work good****/
/*****WAY4-Work good****/
/*****WAY4-Work good****/
/*****WAY4-Work good****/
data have;
set raw_tbl test_tbl(in=inb);
pred=inb;/**Binary indicator that mark obs that we want to score**/
run;
proc genmod data=have namelen=60 descending ;
class sex ;
model subs=sex age/ dist=binomial link=logit type3 wald ;
output out=WANT p=P_subscibe xbeta=logit;
/**Predict Var is called P_subscibe**/
/**Output data set with row data and predict var is called WANT**/
run;
proc print data=WANT noobs;
where pred=1;
run;
OK. But that is not so easy to score a new dataset with category variables.
data raw_tbl; input sex $ age subs @@; cards; Female 35 0 Male 44 0 Male 45 1 Female 47 1 Female 51 0 Female 47 0 Male 54 1 Male 47 1 Female 35 0 Female 34 0 Female 48 0 Female 56 1 Male 46 1 Female 59 1 Female 46 1 Male 59 1 Male 38 1 Female 39 0 Male 49 1 Male 42 1 Male 50 1 Female 45 0 Female 47 0 Female 30 1 Female 39 0 Female 51 0 Female 45 0 Female 43 1 Male 39 1 Male 31 0 Female 39 0 Male 34 0 Female 52 1 Female 46 0 Male 58 1 Female 50 1 Female 32 0 Female 52 1 Female 35 0 Female 51 0 ; Run; data test_tbl; input sex $ age; cards; Female 35 Male 19 Male 70 ; run; options validvarname=v7; ods output parameterestimates=RonLogitModel2; proc genmod data=raw_tbl namelen=60 descending ; class sex ; model subs=sex age/ dist=binomial link=logit type3 wald ; output out=xxx p=P_subscibe xbeta=logit; run; /*create design matrix for scoring*/ data test; set test_tbl; subs=1; run; proc glmselect data=test outdesign=outdesign(drop=subs) noprint; class sex; model subs=sex age/selection=none; /*same model as PROC GENMOD*/ quit; /*make a parameter dataset*/ data RonLogitModel2; set RonLogitModel2 end=last; vname=catx('_',Parameter,Level1); if last then delete; /*only keep variables which are used to score*/ run; proc transpose data=RonLogitModel2 out=RonLogitModel22 ; var Estimate ; id vname; run; data RonLogitModel22; set RonLogitModel22; _type_='PARM'; run; /*score dataset test_tbl*/ proc score data=outdesign out=want score=RonLogitModel22 type=parm; run; data want; set want(rename=(Estimate=xbeta)); predicted=logistic(xbeta); run; proc print;run;
@Ronein wrote:
I have tried but get error
ERROR: The file WORK.MYLOGITMODEL1 does not exist or it is not a valid item store.
data raw_tbl; input sex $ age subs @@; cards; Female 35 0 Male 44 0 Male 45 1 Female 47 1 Female 51 0 Female 47 0 Male 54 1 Male 47 1 Female 35 0 Female 34 0 Female 48 0 Female 56 1 Male 46 1 Female 59 1 Female 46 1 Male 59 1 Male 38 1 Female 39 0 Male 49 1 Male 42 1 Male 50 1 Female 45 0 Female 47 0 Female 30 1 Female 39 0 Female 51 0 Female 45 0 Female 43 1 Male 39 1 Male 31 0 Female 39 0 Male 34 0 Female 52 1 Female 46 0 Male 58 1 Female 50 1 Female 32 0 Female 52 1 Female 35 0 Female 51 0 ; Run; data test_tbl; input sex $ age; cards; Female 35 Male 19 Male 70 ; run; ods output parameterestimates=MyLogitModel1; proc genmod data=raw_tbl namelen=60 descending ; class sex age; model subs=sex age/ dist=binomial link=logit type3 wald ; output out=xxx p=P_subscibe xbeta=logit; ODS SELECT ModelANOVA; run; ODS OUTPUT CLOSE; /**score new data set??****/ proc plm restore=MyLogitModel1; score data=test_tbl out=test_out_tbl predicted / ilink; run;
I said: "Same as you did in PROC PROBIT." You didn't do the same thing in PROC GENMOD that you did in PROC PROBIT. Please look at your code for these two PROCs and find the difference.
proc genmod data=raw_tbl namelen=60 descending ;
class sex;
model subs=sex age/ dist=binomial link=logit type3 wald ;
output out=xxx
p=P_subscibe xbeta=logit;
store out=ABC;
ODS SELECT ModelANOVA;
run;
proc plm restore=ABC;
score data=test_tbl out=test_out_tbl predicted / ilink;
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.