Hi all,
i would like to use lasso application to exclude not important variables.
I would like to run the regression on 5 years on data (2000-2004) and validate it on the year 2005. My dataset contains years from 1980-2020, so does anyone have an idea how i could handle this? I tried to safe all data from 2005 in a new dataset, but it doesnt work. I think the starting point is something like this
proc glmselect data=mylib. dataset plots=all seed=123 valdata= ??? ;
where 2000 <= year <= 2004 ;
model y= x1........x100
/selection= lasso (stop=none choose=validate);
ods output parameterestimates= check_lasso_parms;
run;
Thanks a lot for an answer
How comes you cannot save your 2005 data in a separate dataset?
All you should do is this:
data want;
set have;
where year(your_date_var)=2005;
run;
Take care there is a substantial difference between validation data (VALDATA=) and test data (TESTDATA=).
Also for more info on LASSO, I advise this paper:
SAS Global Forum 2020
Paper SAS4287-2020
A Survey of Methods in Variable Selection and Penalized Regression
Yingwei Wang, SAS Institute Inc.
https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/4287-2020.pdf
Cheers,
Koen
Hello,
I would be astonished if VALDATA= does not work when used appropriately.
In cases as this, it's always best to include the LOG.
Please include the LOG by using the 'Insert Code' icon (</>) above your entry, that way the LOG does not loose structure and formatting.
Thanks,
Koen
Hello,
I haven't tested it (I leave that up to you 😉 ) but I would guess that the where-clause applies to all incoming datasets, also the VALDATA= ds. Hence, no observations qualify for validation anymore which is a problem with choose=validate of course.
Cheers,
Koen
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.