- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I am running a lasso model on a large dataset using the code below. I am trying to interpret the results but for some of the variables in the CLASS option, I cannot tell which dummy SAS decided to exclude as the reference group and therefore cannot interpret the coefficients. For example, hsa has 8 levels, 1-8, so SAS will automatically remove one from the model as the reference group. Is there a way to have SAS tell me which one it removed of the 8 (and for the other variables in the CLASS option)?
I chose the option stop=none so that I could figure out which dummy SAS removed because SAS would loop through all the variables until none are left. I could then deduce which dummy from each CLASS variable SAS used as the reference group. However, when I run my code, about halfway through running through the variables, SAS stops the lasso and gives a message: "Selection stopped because the change of the maximum absolute correction is tiny." From a statistical standpoint I am okay with this message but it prevents me from figuring out the reference group used for each CLASS variable. Any way I can override this and have SAS finish the Lasso to the end?
proc glmselect data=final.Claim_1617_m7_filterx plots(stepaxis=normb)=all;
CLASS gender_cd hsa anest_method src_pymt_cd1;
MODEL chrg_tot_amt = age HOPD oper_time_numb_ln CC_wght_total multiple_payor OR_staff
gender_cd hsa anest_method src_pymt_cd1 proc_cdJ0690--proc_cdJ3410 pat_race_cd49--pat_race_cd46
/selection = lasso(stop=none choose=cvex);
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
By default, SAS sets to coefficient to zero of the last alphabetical level in a CLASS variable.
You can use the REF= option on the CLASS statement to override this default.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
By default, SAS sets to coefficient to zero of the last alphabetical level in a CLASS variable.
You can use the REF= option on the CLASS statement to override this default.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Paige,
When I run the following code, SAS still seems to override my CLASS reference options. Do you have an idea why that might be? For example, anest_method has 4 levels: 10, 20, 30, 40. I am choosing a reference group of 10 but when the results are shown, I see that SAS included level 10 as a variable in the model with a non-zero coefficient and instead omitted level 40 (I know in proc reg or proc glm the reference group will be included in the output but with a coefficient of 0). All of my CLASS variables are in string format.
I've also consulted the proc glmselect user's guide and found my syntax to be consistent with what it suggests: https://support.sas.com/documentation/onlinedoc/stat/142/glmselect.pdf
Title 'Carpal OP Lasso_New Unknown Payer';
proc glmselect data=final.Claim_1617_m7_filterx plots(stepaxis=normb)=all;
CLASS gender_cd (ref='M') hsa (ref='5') anest_method (ref='10') src_pymt_cd_new (ref='Comm');
MODEL log_chrg_tot_amt = age HOPD log_oper_time_numb CC_wght_total_dum multiple_payor OR_staff
gender_cd hsa anest_method src_pymt_cd_new occur_cd04 proc_cdJ0690--proc_cdJ3410 pat_race_cd04--pat_race_cd05 isolated_rev_cd
/selection = lasso(stop=none choose=cvex);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This doesn't seem right to me.
Can you show us the relevant parts of the output?
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I attached a truncated version of my output and used dummy data to censor information but the same problem is happening with my real data. You can see that even though I specified hsa (ref='3') and anest_method (ref='10'), the output shows that Lasso stepped through hsa = 3 and anest_method = 10 as if they are variables to include in the model. It also included those levels in the final specified model without coefficients of 0. SAS instead chose reference groups to be hsa = 5 and anest_method = 40 given that they do not appear in the final model. If I use the same CLASS code but with proc glm it works as expected so seems like this problem is specific to proc glmselect.
Code:
Title 'Carpal OP Lasso_New3';
proc glmselect data=final.Claim_1617_m7_filter plots(stepaxis=normb)=all;
CLASS gender_cd (ref='M') hsa (ref='3') anest_method (ref='10') src_pymt_cd_new (ref='Comm');
MODEL log_chrg_tot_amt = age HOPD log_oper_time_numb CC_wght_total_dum multiple_payor
gender_cd hsa anest_method src_pymt_cd_new occur_cd04 proc_cdJ0690--proc_cdJ3410 pat_race_cd04--pat_race_cd05 isolated_rev_cd
/selection = lasso(stop=none choose=cvex);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't download or open Microsoft Office files. Please just post the portion of the output into your reply.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Apologies - copying what was above:
Below is a truncated version of my output using dummy data to censor information but the same problem is happening with my real data. You can see that even though I specified hsa (ref='3') and anest_method (ref='10'), the output shows that Lasso stepped through hsa = 3 and anest_method = 10 as if they are variables to include in the model. It also included those levels in the final specified model without coefficients of 0. SAS instead chose reference groups to be hsa = 5 and anest_method = 40 given that they do not appear in the final model. If I use the same CLASS code but with proc glm it works as expected so seems like this problem is specific to proc glmselect.
Code:
Title 'Carpal OP Lasso_New3';
proc glmselect data=final.Claim_1617_m7_filter plots(stepaxis=normb)=all;
CLASS gender_cd (ref='M') hsa (ref='3') anest_method (ref='10') src_pymt_cd_new (ref='Comm');
MODEL log_chrg_tot_amt = age HOPD log_oper_time_numb CC_wght_total_dum multiple_payor
gender_cd hsa anest_method src_pymt_cd_new occur_cd04 proc_cdJ0690--proc_cdJ3410 pat_race_cd04--pat_race_cd05 isolated_rev_cd
/selection = lasso(stop=none choose=cvex);
run;
Data Set | FINAL.CLAIM_1617_M7_FILTER |
Dependent Variable | log_chrg_tot_amt |
Selection Method | LASSO |
Stop Criterion | None |
Choose Criterion | External Cross Validation |
External Cross Validation Method | Random |
External Cross Validation Fold | 5 |
Effect Hierarchy Enforced | None |
Random Number Seed | 616729001 |
Class Level Information | ||
Class | Levels | Values |
gender_cd | 2 | F M |
hsa | 7 | 1 2 5 6 7 8 3 |
anest_method | 4 | 20 30 40 10 |
src_pymt_cd_new | 4 | Federal Self-Pay Unknown Comm |
Dimensions | |
Number of Effects | 404 |
Number of Effects after Splits | 417 |
Number of Parameters | 417 |
LASSO Selection Summary | ||||
Step | Effect | Effect | Number | CVEX PRESS |
0 | Intercept |
| 1 | 0.4617 |
1 | hsa_1 |
| 2 | 0.4422 |
2 | proc_cdJ7120 |
| 3 | 0.4326 |
3 | proc_cdJ3490 |
| 4 | 0.4171 |
4 | hsa_8 |
| 5 | 0.4027 |
5 | proc_cdJ2704 |
| 6 | 0.3872 |
6 | hsa_7 |
| 7 | 0.3793 |
7 | isolated_rev_cd |
| 8 | 0.3476 |
8 | hsa_2 |
| 9 | 0.3398 |
9 | anest_method_10 |
| 10 | 0.3201 |
10 | proc_cd64721 |
| 11 | 0.2722 |
11 | HOPD |
| 12 | 0.2337 |
12 | proc_cdJ3010 |
| 13 | 0.2240 |
13 | anest_method_20 |
| 14 | 0.1983 |
14 | log_oper_time_numb |
| 15 | 0.1940 |
15 | hsa_3 |
| 16 | 0.1901 |
Parameter Estimates | ||
Parameter | DF | Estimate |
Intercept | 1 | 8.204030 |
HOPD | 1 | 0.042461 |
log_oper_time_numb | 1 | 0.061125 |
CC_wght_total_dum | 1 | 0.009938 |
hsa_1 | 1 | -0.487687 |
hsa_2 | 1 | -0.326204 |
hsa_6 | 1 | 0.000374 |
hsa_7 | 1 | 0.000773 |
hsa_8 | 1 | 0.098029 |
hsa_3 | 1 | -0.899385 |
anest_method_20 | 1 | 0.033017 |
anest_method_30 | 1 | -0.343045 |
anest_method_10 | 1 | -0.090350 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I guess I don't have any other thoughts on why this is happening.
Paige Miller