Help using Base SAS procedures

feature (lasso) selection and parameter's p-value

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 78
Accepted Solution

feature (lasso) selection and parameter's p-value

If I use:

 

/selection=none stb showpvalues;

 

as option for proc glmselect I get:

 

Effect Parameter DF Estimate StandardizedEst StdErr tValue Probt
Intercept Intercept 1 9.985494 0 0.269958 36.99 <.0001
Bla Bla 1 -4.941651 -0.877694553 0.129965 -38.02 <.0001

 

where Probt is a parameter's p-value.

 

However, if I use:

 

/selection=lasso(stop=none choose=sbc) showpvalues;

 

I do not get any p-values for the parameters. Can I obtain them?


Accepted Solutions
Solution
‎02-21-2017 04:16 PM
Contributor
Posts: 69

Re: feature (lasso) selection and parameter's p-value

[ Edited ]

If you don't end up getting it through the ODS table, I'd contact SAS tech support.

 

That being said, I'll copy in the code we use for writing the GLMSELECT results to a text file, and read it back in and parse out the estimates. It would save you time if you need to go that route. We use this to build 2 tables - the parameter estimates and the goodness of fit statistics + holdout sample error (which we use as our selection criteria).

 

Earlier, i've specificed proc printto and my glm select and then closed the printto.

 

data modeloutput parms modelstats;
infile "output.txt" firstobs= 1 lrecl=32767 truncover;
attrib SAS_OUTPUT length = $250 txn_name length = $50;
input SAS_OUTPUT 1 - 250;
sas_output=compress(sas_output,,'c'); *removes the form feed character that shows page breaks, which interferes with logic below;
output modeloutput;
retain txn_Name parmtable_f;
if substr(sas_output,1,3) = "---" then txn_name=strip(tranwrd(substr(sas_output,index(upcase(sas_output),"NEW_SKILL_NAME")+15,50),'-',''));
if substr(sas_output,1,11) = "Parameter E" then parmtable_f = 1;
*everything after this point will have a 1**;
if substr(sas_output,1,3) = "---" then parmtable_f = 0;
*at the end of the table, switch value back to zero;

*now, output all of the parameter estimates to a separate data set;
if parmtable_f = 1 then output parms;
else if substr(sas_output,1,8) = "Adj R-Sq"
or substr(sas_output,1,3) = "AIC"
or substr(sas_output,1,3) = "SBC"
or substr(sas_output,1,3) = "ASE"
or find(sas_output,"Number of Observations")=1
then output modelstats;
drop parmtable_f;
run;
data modelstats2(drop=sas_output);
set modelstats;
metric_name = substr(sas_output,1,find(sas_output," "));
metric_value=compbl(substr(sas_output,find(sas_output," "),50))*1;
run;
data parms2(drop=sas_output p_value2 where=(parameter is not missing));
attrib txn_name parameter class_value length=$50
DF Estimate Std_Error T_Score P_value format=best12.
p_value2 length=$10;
set parms;
if substr(sas_output,1,9) = "Parameter" then delete;
if substr(sas_output,1,5) = " " then delete;
if substr(sas_output,1,1)="*" then sas_output=substr(sas_output,3,length(sas_output)-2);
*Non-classification variables;
if scan(sas_output,1," ")="Intercept" or scan(sas_output,1," ")="Weektrend" then do;
parameter=scan(sas_output,1," ");
class_value="";
DF=scan(sas_output,2," ")*1;
Estimate=scan(sas_output,3," ")*1;
Std_Error=scan(sas_output,4," ")*1;
T_Score=scan(sas_output,5," ")*1;
P_Value2=scan(sas_output,6," ");
end;
*Classification variables;
else do;
parameter=scan(sas_output,1," ");
class_value=scan(sas_output,2," ");
DF=scan(sas_output,3," ")*1;
Estimate=scan(sas_output,4," ")*1;
Std_Error=scan(sas_output,5," ")*1;
T_Score=scan(sas_output,6," ")*1;
P_Value2=scan(sas_output,7," ");
end;

if p_value2="<.0001" then p_value=.0001;
else p_value=p_value2*1;

if estimate=. then delete;
run;

 

View solution in original post


All Replies
Contributor
Posts: 69

Re: feature (lasso) selection and parameter's p-value

That doesn't seem right. What if you use BACKWARD/FORWARD/STEPWISE selection?

 

Are you getting the output from the LISTING output? 

 

We use GLMSELECT with those other selection methods, but we have tested LASSO. I didn't do it personally but no one said that they didn't get the normal output so I don't believe that's a "feature." We generally write the output out to a text file with PROC PRINTTO and then read it back in so that we can have it in a dataset-- but I'm 100% that when we don't, we are able to see the parameter estimate table at each iteration of selection (with the details option on).

Frequent Contributor
Posts: 78

Re: feature (lasso) selection and parameter's p-value

[ Edited ]

Mmhh. I am using:

PROC GLMSELECT data=bla.dibla;
model Y
=
....
/selection=lasso(stop=none choose=sbc);
ods output
ParameterEstimates = ParameterEstimates
"Fit Statistics" = WORK.Model_Fit;
run;


and ParameterEstimates does not contain the p values in the case of lasso ...

Contributor
Posts: 69

Re: feature (lasso) selection and parameter's p-value

Ok so that's slightly different than the way we use it because you're getting the ODS table (which I didn't do originally because I didn't think of it). Do the parameter estimates get printed to your output screen?

Frequent Contributor
Posts: 78

Re: feature (lasso) selection and parameter's p-value

In my use case it would not work as the parameters are extracted from ParameterEstimates as I fit 100s of models ...
Solution
‎02-21-2017 04:16 PM
Contributor
Posts: 69

Re: feature (lasso) selection and parameter's p-value

[ Edited ]

If you don't end up getting it through the ODS table, I'd contact SAS tech support.

 

That being said, I'll copy in the code we use for writing the GLMSELECT results to a text file, and read it back in and parse out the estimates. It would save you time if you need to go that route. We use this to build 2 tables - the parameter estimates and the goodness of fit statistics + holdout sample error (which we use as our selection criteria).

 

Earlier, i've specificed proc printto and my glm select and then closed the printto.

 

data modeloutput parms modelstats;
infile "output.txt" firstobs= 1 lrecl=32767 truncover;
attrib SAS_OUTPUT length = $250 txn_name length = $50;
input SAS_OUTPUT 1 - 250;
sas_output=compress(sas_output,,'c'); *removes the form feed character that shows page breaks, which interferes with logic below;
output modeloutput;
retain txn_Name parmtable_f;
if substr(sas_output,1,3) = "---" then txn_name=strip(tranwrd(substr(sas_output,index(upcase(sas_output),"NEW_SKILL_NAME")+15,50),'-',''));
if substr(sas_output,1,11) = "Parameter E" then parmtable_f = 1;
*everything after this point will have a 1**;
if substr(sas_output,1,3) = "---" then parmtable_f = 0;
*at the end of the table, switch value back to zero;

*now, output all of the parameter estimates to a separate data set;
if parmtable_f = 1 then output parms;
else if substr(sas_output,1,8) = "Adj R-Sq"
or substr(sas_output,1,3) = "AIC"
or substr(sas_output,1,3) = "SBC"
or substr(sas_output,1,3) = "ASE"
or find(sas_output,"Number of Observations")=1
then output modelstats;
drop parmtable_f;
run;
data modelstats2(drop=sas_output);
set modelstats;
metric_name = substr(sas_output,1,find(sas_output," "));
metric_value=compbl(substr(sas_output,find(sas_output," "),50))*1;
run;
data parms2(drop=sas_output p_value2 where=(parameter is not missing));
attrib txn_name parameter class_value length=$50
DF Estimate Std_Error T_Score P_value format=best12.
p_value2 length=$10;
set parms;
if substr(sas_output,1,9) = "Parameter" then delete;
if substr(sas_output,1,5) = " " then delete;
if substr(sas_output,1,1)="*" then sas_output=substr(sas_output,3,length(sas_output)-2);
*Non-classification variables;
if scan(sas_output,1," ")="Intercept" or scan(sas_output,1," ")="Weektrend" then do;
parameter=scan(sas_output,1," ");
class_value="";
DF=scan(sas_output,2," ")*1;
Estimate=scan(sas_output,3," ")*1;
Std_Error=scan(sas_output,4," ")*1;
T_Score=scan(sas_output,5," ")*1;
P_Value2=scan(sas_output,6," ");
end;
*Classification variables;
else do;
parameter=scan(sas_output,1," ");
class_value=scan(sas_output,2," ");
DF=scan(sas_output,3," ")*1;
Estimate=scan(sas_output,4," ")*1;
Std_Error=scan(sas_output,5," ")*1;
T_Score=scan(sas_output,6," ")*1;
P_Value2=scan(sas_output,7," ");
end;

if p_value2="<.0001" then p_value=.0001;
else p_value=p_value2*1;

if estimate=. then delete;
run;

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 176 views
  • 1 like
  • 2 in conversation