Solved: Re: Linear Regression Program

saraphdnz · Posted 03-30-2020 05:56 AM

Hi there,

I am trying to estimates beta coefficients as in one file for different dependent variables (Fs) and one independent variable (EPU). I run the following code and want to generalize this program for all my dependent variables (F774 - F781) one by one. My data file is attached.

proc reg data=B_93 noprint outest=myests;
model f774=EPU;
run;

Please can you help me write a program that can give me estimates in a separate file for each regression.

Regards,

Sara

ballardw · Posted 03-30-2020 05:47 PM

If the variables of interest are the only ones that have a common base name you may be able to use a list like: F: note that there is a colon immediately following the F. This would attempt to use all variables that start with F in that role. Or given the picture you shared perhaps something like F1: F2: F3: <follow the obvious pattern>. This list would attempt to use all variables whose names start with F1 or F2 or F3 (etc).

Or if all of the variables you want (groups of them) are adjacent columns in the SAS data set you can use the two dash list builder: F774 -- F781 would use those variables that are adjacent even if there is a gap in the numbers after the F.

Another extremely viable approach in SAS is to transpose the data so that you have one variable which would be the original variable name, its value as a single variable. Then sort by the name and use BY group processing to get a regression for each level of the variable name. Here is a brief example with a small data set you should have access to, the Sashelp.class data set.

This example transforms the data so that there is age independent variable, sex just to show how to keep other variables for grouping in by statement later, variable that has the name of the original variable and value which has the value of each variable. There is one row in the output for each variable in array of variables of interest. The sorted and a regression run for the combinations of Sex and Variable.

I am not going to say this is a good model, just a way to build code.

View solution in original post

PaigeMiller · Posted 03-30-2020 06:43 AM

proc reg data=B_93 noprint outest=myests;
model f774-f781=EPU;
run;

All of your regression outputs are now stored in MYESTS. If you really want them in separate files, you can split that data set up (but really, don't bother, it isn't worth the effort). You've got all the results.

If you really need to extract the results for just F774, you can get them via a WHERE statement, such as, for example using PROC PRINT.

proc print data=MYESTS;
where _depvar_='F774';
run;

--
Paige Miller

saraphdnz · Posted 03-30-2020 04:57 PM

Hello Paige,

Thanks for the code.

But one thing which i forgot to mention is that my column numbers are not in sequence and I have around 600 columns in each worksheet. In that situation what will be the best option to perform this regression? Do i need to change the name of columns to make them in sequence?

Regards,

Sara

ballardw · Posted 03-30-2020 05:47 PM

If the variables of interest are the only ones that have a common base name you may be able to use a list like: F: note that there is a colon immediately following the F. This would attempt to use all variables that start with F in that role. Or given the picture you shared perhaps something like F1: F2: F3: <follow the obvious pattern>. This list would attempt to use all variables whose names start with F1 or F2 or F3 (etc).

Or if all of the variables you want (groups of them) are adjacent columns in the SAS data set you can use the two dash list builder: F774 -- F781 would use those variables that are adjacent even if there is a gap in the numbers after the F.

Another extremely viable approach in SAS is to transpose the data so that you have one variable which would be the original variable name, its value as a single variable. Then sort by the name and use BY group processing to get a regression for each level of the variable name. Here is a brief example with a small data set you should have access to, the Sashelp.class data set.

This example transforms the data so that there is age independent variable, sex just to show how to keep other variables for grouping in by statement later, variable that has the name of the original variable and value which has the value of each variable. There is one row in the output for each variable in array of variables of interest. The sorted and a regression run for the combinations of Sex and Variable.

I am not going to say this is a good model, just a way to build code.

saraphdnz · Posted 04-08-2020 12:05 AM

Hi there - thanks for the helpful suggestion to run multiple dependent variable linear regressions. I use the following code and getting estimates but I just want to get the intercept values and not all those extra information. please can you help me on this?

proc reg noprint data=Myfiles.dset1 outest=myfiles.regout1 edf ;
 model F1--F2 = EPU;
 run;

PaigeMiller · Posted 04-08-2020 07:34 AM

I'm afraid that SAS is going to give you all that information in the output data set. But if you just need the intercept, you can simply ignore the rest of the variables, or even drop the columns you don't need.

--
Paige Miller

PaigeMiller · Posted 03-30-2020 05:51 PM

If the column names are f1 to f600, the order doesn't matter. F1-F600 will work, as long as every one of the variables F1 through F600 are present in the data set.

If they are not named consecutively, and they have diverse names like first column and last_column, then something like this:

First_column -- last_column

But ...

why are you making us guess at the variable names? Can you just give us an example of the pattern, or of the first 10 actual variable names?

Now I'm skeptical of the entire procedure. Running 600 regressions doesn't really seem like a good idea to me, what are you going to do once you have all of them? It's hard for me to imagine how these 600 regressions will be useful.

--
Paige Miller

saraphdnz · Posted 03-30-2020 08:45 PM

Hello Paige,

Actually I am doing univariate portfolio analysis where I will sort equities returns based on their beta coefficient. I have 36 months rolling window and each month from Jan93 untill Dec19 I need the beta coefficient for each equity. Each month have different number of equities like for Jan93 there are 256 equities (in Feb93 its 255 equities and so on) and need beta for each equity so that I can sort the equities based on their betas to construct decile portfolios. I have to repeat this for all 361 months.

I have already created all 361 sheets for 36 month rolling window and each sheet have different equities and this is the reason I have different column name.

In total I have 5000 equities so F1 F2 --- F5000 but based on 36 months return rolling window, every month have different number of equities that fulfills the 36 months return requirement.

Do I need to do this rolling window thing in SAS to make this process much easier?

I am bit stuck here and appreciate if you you can help me on this.

Regards,

Sara

PaigeMiller · Posted 03-31-2020 06:31 AM

If you search the internet, people have posted actual SAS code to produce a "rolling-window regression" in SAS. There's a macro to do this, and also a version that doesn't involve macros. I don't think they account for the possibility of different number of variables/equities in each month.

It may be (as I think about it now) that you are going to need a rather sophisticated macro to handle the different number of variables/equities each month; or it may be that if you have all 5000 variables each month, with some of them containing all missing values for a given month would work, I don't know, you'd have to try it; or maybe there's easier solutions that I haven't thought of.

In any event, it seems like you are going to need to do some fairly sophisticated programming.

--
Paige Miller

saraphdnz · Posted 03-31-2020 05:01 PM

Thanks for the helpful suggestion.

Regards,

Sara

SAS Innovate 2025: Call for Content