Hi as mentioned in the subject, how can I select all the variable from a table to run the regression. I can't manually type 100s of variables. Is there a way to get all the variable except 1 I am using as a depended variable?
If I were to manually type it would look like this... how can I automatically select all the variables from a table except 1 or few?
proc logistic data= testdata;
model depvar = test1 test2 apple1 ball .... cat;
run;
Thanks,
Shone
Hi! So yes, you can select all the other variables in a data set for use in regression or logistic regression.
But before I get to that, I consider this a VERY BAD idea, because these hundreds of x-variables will be correlated with one another and so you will be badly overfitting the model; and the coefficients of the regression will have HUGE variances.
But ... here's how you select all the variables except the Y variable.
proc contents data=testdata noprint out=_contents_;
run;
proc sql noprint;
select name into :names separated by ' ' from _contents_ where upcase(name)^='DEPVAR';
quit;
From there, instead you could use the macro variable just created, &names, as the right hand side of the model statement. But don't do that. Go ahead and read up on variable selection methods, and use those; or use your subject matter judgement (or your client's subject matter judgment) to pick at most 10 variables that are likely to be the most predictive.
The best solution, IMHO, is not programmed in SAS unfortunately (but is programmed in R) is to perform Logistic Partial Least Squares regression, in which case you can use all of the variables. Here is a reference: https://cedric.cnam.fr/fichiers/RC906.pdf. For continuous Y-variables, use PROC PLS in SAS.
Hi! So yes, you can select all the other variables in a data set for use in regression or logistic regression.
But before I get to that, I consider this a VERY BAD idea, because these hundreds of x-variables will be correlated with one another and so you will be badly overfitting the model; and the coefficients of the regression will have HUGE variances.
But ... here's how you select all the variables except the Y variable.
proc contents data=testdata noprint out=_contents_;
run;
proc sql noprint;
select name into :names separated by ' ' from _contents_ where upcase(name)^='DEPVAR';
quit;
From there, instead you could use the macro variable just created, &names, as the right hand side of the model statement. But don't do that. Go ahead and read up on variable selection methods, and use those; or use your subject matter judgement (or your client's subject matter judgment) to pick at most 10 variables that are likely to be the most predictive.
The best solution, IMHO, is not programmed in SAS unfortunately (but is programmed in R) is to perform Logistic Partial Least Squares regression, in which case you can use all of the variables. Here is a reference: https://cedric.cnam.fr/fichiers/RC906.pdf. For continuous Y-variables, use PROC PLS in SAS.
Thank you so much. I agree we might not want to take all the variables but this dataset contains all the variable needed for analysis.
Did you try it almost like you have it?
proc logistic data= testdata;
model depvar = test1--cat;
run;
Can you please let me know how do I account that depvar could be in the middle of the dataset. Please note that depvar is also in the dataset. If I used the test1- cat it will take depvar as well
Similar approach, but using SASHELP tables.
proc sql noprint;
select name into :var_list separated by " " from sashelp.vcolumn where upcase(libname)='SASHELP' and upcase(memname)='CARS'
and upcase(name) ne ('INVOICE');;
quit;
%put &var_list;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.