BookmarkSubscribeRSS Feed
AbuYusuf
Calcite | Level 5

Hi,

 

I would like to run a number of linear univariate regressions of the form Y = aXi + e and have been trying to figure out how to use arrays for that, instead of writing out 20 regressions. I managed to create an array of independent variables in the data step, but I cannot figure out a way to access it in proc regress. There must be a simple way to do this, but having spent a day looking I haven't found it. Would be grateful if you guys could help.

 

Thank you!

14 REPLIES 14
Reeza
Super User

I suspect your best approach is to Reformat your data and use a BY statement.

 

If you want further suggestions please provide more detailed information including how your information currently looks and the type or PROC REG statements you're looking to develop.

 


@AbuYusuf wrote:

Hi,

 

I would like to run a number of linear univariate regressions of the form Y = aXi + e and have been trying to figure out how to use arrays for that, instead of writing out 20 regressions. I managed to create an array of independent variables in the data step, but I cannot figure out a way to access it in proc regress. There must be a simple way to do this, but having spent a day looking I haven't found it. Would be grateful if you guys could help.

 

Thank you!


 

AbuYusuf
Calcite | Level 5

Thanks a lot for the reply!

 

Data is a large administrative database that has a number of diagnostic codes which I use to create a number of disease variables, say, diabetes, influenza, etc. I have cost as the dependent variable and I want to run a number of regressions where each disease is an independent variable, and adjust for a number of other factors, such as length of stay, etc. I also have age and province as categorical variables, so I run these regressions BY age category and province.

 

In STATA I would have done it like this:

local diseases "diabetes influenza"

foreach a of local diseases {

regress cost diseases[`a']

}

 

That's basically it. Thank you very much for your help!

data_null__
Jade | Level 19

@AbuYusuf wrote:

Hi,

 

I would like to run a number of linear univariate regressions of the form Y = aXi + e and have been trying to figure out how to use arrays for that, instead of writing out 20 regressions. I managed to create an array of independent variables in the data step, but I cannot figure out a way to access it in proc regress. There must be a simple way to do this, but having spent a day looking I haven't found it. Would be grateful if you guys could help.

 

Thank you!


proc transpose name=IndVar data=sashelp.class out=class2(rename=col1=X);
   by name age;
   var height weight;
   run;
proc sort data=class2;
   by indvar;
   run;
proc reg data=class2;
   by indvar;
   model age = x;
   attrib _all_ label='';
   run;
PaigeMiller
Diamond | Level 26

This entire thread fits under the category of:

 

Just because you CAN do the regressions this way, it doesn't mean you SHOULD do the regressions this way.

 

Instead of ordinary least squares regression, I recommend partial least squares regression (PROC PLS) which has better statistical properties (smaller root mean square error of predictions and of regression coefficients) than doing many regressions.

 

If you want to determine which variables are important in predicting the response, and you do many regressions, you are not accounting for possible confounding of one x variable with another x variable. PLS handles this better.

--
Paige Miller
AbuYusuf
Calcite | Level 5
Thank you. I will check out proc pls.
AbuYusuf
Calcite | Level 5

Thank you very much! I tried the code, but my adaptation of it to my data didn't work...

ChrisNZ
Tourmaline | Level 20

It's a bit hard to reply without seeing the start and end points.

Please provide a small example of data and the desired procedure calls.

 

PGStats
Opal | Level 21

Use a variable list. In proc reg you may specify

 

model a -- z = myVar;

 

to regress all variables in your dataset variable list from a to z against myVar.

PG
data_null__
Jade | Level 19

@PGStats wrote:

Use a variable list. In proc reg you may specify

 

model a -- z = myVar;

 

to regress all variables in your dataset variable list from a to z against myVar.


I thought the OP said there are many independent (X) variables.  

PGStats
Opal | Level 21

OOps!

PG
PGStats
Opal | Level 21

You can automate with call execute()

 

data _null_;
length reg $200;
set sashelp.cars;
array x Invoice -- length;
do i = 1 to dim(x);
    reg = cats(
        "proc reg data=sashelp.cars plots=none outest=out_",
        vname(x{i}),
        "; model MSRP=", 
        vname(x{i}),
        "; run;" );
    call execute (reg);
    end;
stop;
run;

data est_all;
set out_: ;
run;

proc print data=est_all; run;

 

 

 

PG
PGStats
Opal | Level 21

Another way is to trick proc reg into testing every variable for best subset selection

 


proc reg data=sashelp.cars outest=all_est;
model MSRP = Invoice -- length / selection=RSQUARE stop=1;
run;
quit;

proc print data=all_est; run;
PG
Reeza
Super User

https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/

 

Here's a full write up on macros if you choose to go down that route.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1167 views
  • 7 likes
  • 6 in conversation