BookmarkSubscribeRSS Feed
oggylang
Calcite | Level 5

Hi SAS Forum,

 

I am currently doing some LASSO regression, and have a headache, that I hope someone else has had before, and therefore might be able to sort out. I am doing a LASSO regression, and I want to partition my data. I have made a dummy variable indicating whether the data belongs to training or testing part of the data set, but I am having struggles implementing this into the partition statement. The dummy is treated. The SAS documentation is a bit limited and mainly focuses on partitioning by choosing a share of the data rather than choosing based on a variable.

 

 

proc glmselect data = forecastmerge2;
class herkomst;
model PostEmplSumD = Woman Married PriorEmpl c:/
selection = lasso(stop=none choose=cvex);
partition = treated(test=(treated=1) train=(treated=0));
output out=GLMOut p = p_hat;
run;

 

If you have any questions, please let me know. I tried to add all of the coding which seemed relevant for the question.

 

Oggylang 

1 REPLY 1
Rick_SAS
SAS Super FREQ

Here is the documentation that specifies the correct syntax. I think for your data the PARTITION statement would look like this (untested)

partition ROLEVAR=treated(test='1' train='0');

 You can also include a character variable named _ROLE_ in the input data that has the values "TRAIN" and "TEST". If the input data contains a _ROLE_ variable, then you can omit the PARTITION statement.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 473 views
  • 0 likes
  • 2 in conversation