BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Shone
Calcite | Level 5

Hi as mentioned in the subject, how can I select all the variable from a table to run the regression. I can't manually type 100s of variables. Is there a way to get all the variable except 1 I am using as a depended variable?

 

If  I were to manually type it would look like this... how can I automatically select all the variables from a table except 1 or few?

 

 

proc logistic data= testdata;
model depvar = test1 test2 apple1 ball .... cat;
run;

Thanks, 

Shone

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Hi! So yes, you can select all the other variables in a data set for use in regression or logistic regression.

 

But before I get to that, I consider this a VERY BAD idea, because these hundreds of x-variables will be correlated with one another and so you will be badly overfitting the model; and the coefficients of the regression will have HUGE variances.

 

But ... here's how you select all the variables except the Y variable.

 

proc contents data=testdata noprint out=_contents_;
run;

proc sql noprint;
    select name into :names separated by ' ' from _contents_ where upcase(name)^='DEPVAR';
quit;

From there, instead you could use the macro variable just created, &names, as the right hand side of the model statement. But don't do that. Go ahead and read up on variable selection methods, and use those; or use your subject matter judgement (or your client's subject matter judgment) to pick at most 10 variables that are likely to be the most predictive.

 

The best solution, IMHO, is not programmed in SAS unfortunately (but is programmed in R) is to perform Logistic Partial Least Squares regression, in which case you can use all of the variables. Here is a reference: https://cedric.cnam.fr/fichiers/RC906.pdf. For continuous Y-variables, use PROC PLS in SAS.

 

 

--
Paige Miller

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

Hi! So yes, you can select all the other variables in a data set for use in regression or logistic regression.

 

But before I get to that, I consider this a VERY BAD idea, because these hundreds of x-variables will be correlated with one another and so you will be badly overfitting the model; and the coefficients of the regression will have HUGE variances.

 

But ... here's how you select all the variables except the Y variable.

 

proc contents data=testdata noprint out=_contents_;
run;

proc sql noprint;
    select name into :names separated by ' ' from _contents_ where upcase(name)^='DEPVAR';
quit;

From there, instead you could use the macro variable just created, &names, as the right hand side of the model statement. But don't do that. Go ahead and read up on variable selection methods, and use those; or use your subject matter judgement (or your client's subject matter judgment) to pick at most 10 variables that are likely to be the most predictive.

 

The best solution, IMHO, is not programmed in SAS unfortunately (but is programmed in R) is to perform Logistic Partial Least Squares regression, in which case you can use all of the variables. Here is a reference: https://cedric.cnam.fr/fichiers/RC906.pdf. For continuous Y-variables, use PROC PLS in SAS.

 

 

--
Paige Miller
Shone
Calcite | Level 5

Thank you so much. I agree we might not want to take all the variables but this dataset contains all the variable needed for analysis. 

massaaki
Calcite | Level 5
Although u said that is a very bad idea, it helped me too much. Tks!
HB
Barite | Level 11 HB
Barite | Level 11

Did you try it almost like you have it?

 

proc logistic data= testdata;
     model depvar = test1--cat;
run;
Shone
Calcite | Level 5

Can you please let me know how do I account that depvar could be in the middle of the dataset. Please note that depvar is also in the dataset. If I used the test1- cat it will take depvar as well

Reeza
Super User

Similar approach, but using SASHELP tables. 

 

proc sql noprint;
select name into :var_list separated by " " from sashelp.vcolumn where upcase(libname)='SASHELP' and upcase(memname)='CARS'
and upcase(name) ne ('INVOICE');;
quit;

%put &var_list;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 13135 views
  • 4 likes
  • 5 in conversation