Hi SAS Forum,
I am currently doing some LASSO regression, and have a headache, that I hope someone else has had before, and therefore might be able to sort out. I am doing a LASSO regression, and I want to partition my data. I have made a dummy variable indicating whether the data belongs to training or testing part of the data set, but I am having struggles implementing this into the partition statement. The dummy is treated. The SAS documentation is a bit limited and mainly focuses on partitioning by choosing a share of the data rather than choosing based on a variable.
proc glmselect data = forecastmerge2; class herkomst; model PostEmplSumD = Woman Married PriorEmpl c:/ selection = lasso(stop=none choose=cvex); partition = treated(test=(treated=1) train=(treated=0)); output out=GLMOut p = p_hat; run;
If you have any questions, please let me know. I tried to add all of the coding which seemed relevant for the question.
Oggylang
Here is the documentation that specifies the correct syntax. I think for your data the PARTITION statement would look like this (untested)
partition ROLEVAR=treated(test='1' train='0');
You can also include a character variable named _ROLE_ in the input data that has the values "TRAIN" and "TEST". If the input data contains a _ROLE_ variable, then you can omit the PARTITION statement.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.