Hi there,
I am trying to estimates beta coefficients as in one file for different dependent variables (Fs) and one independent variable (EPU). I run the following code and want to generalize this program for all my dependent variables (F774 - F781) one by one. My data file is attached.
proc reg data=B_93 noprint outest=myests;
model f774=EPU;
run;
Please can you help me write a program that can give me estimates in a separate file for each regression.
Regards,
Sara
If the variables of interest are the only ones that have a common base name you may be able to use a list like: F: note that there is a colon immediately following the F. This would attempt to use all variables that start with F in that role. Or given the picture you shared perhaps something like F1: F2: F3: <follow the obvious pattern>. This list would attempt to use all variables whose names start with F1 or F2 or F3 (etc).
Or if all of the variables you want (groups of them) are adjacent columns in the SAS data set you can use the two dash list builder: F774 -- F781 would use those variables that are adjacent even if there is a gap in the numbers after the F.
Another extremely viable approach in SAS is to transpose the data so that you have one variable which would be the original variable name, its value as a single variable. Then sort by the name and use BY group processing to get a regression for each level of the variable name. Here is a brief example with a small data set you should have access to, the Sashelp.class data set.
This example transforms the data so that there is age independent variable, sex just to show how to keep other variables for grouping in by statement later, variable that has the name of the original variable and value which has the value of each variable. There is one row in the output for each variable in array of variables of interest. The sorted and a regression run for the combinations of Sex and Variable.
I am not going to say this is a good model, just a way to build code.
proc reg data=B_93 noprint outest=myests;
model f774-f781=EPU;
run;
All of your regression outputs are now stored in MYESTS. If you really want them in separate files, you can split that data set up (but really, don't bother, it isn't worth the effort). You've got all the results.
If you really need to extract the results for just F774, you can get them via a WHERE statement, such as, for example using PROC PRINT.
proc print data=MYESTS;
where _depvar_='F774';
run;
Hello Paige,
Thanks for the code.
But one thing which i forgot to mention is that my column numbers are not in sequence and I have around 600 columns in each worksheet. In that situation what will be the best option to perform this regression? Do i need to change the name of columns to make them in sequence?
Regards,
Sara
If the variables of interest are the only ones that have a common base name you may be able to use a list like: F: note that there is a colon immediately following the F. This would attempt to use all variables that start with F in that role. Or given the picture you shared perhaps something like F1: F2: F3: <follow the obvious pattern>. This list would attempt to use all variables whose names start with F1 or F2 or F3 (etc).
Or if all of the variables you want (groups of them) are adjacent columns in the SAS data set you can use the two dash list builder: F774 -- F781 would use those variables that are adjacent even if there is a gap in the numbers after the F.
Another extremely viable approach in SAS is to transpose the data so that you have one variable which would be the original variable name, its value as a single variable. Then sort by the name and use BY group processing to get a regression for each level of the variable name. Here is a brief example with a small data set you should have access to, the Sashelp.class data set.
This example transforms the data so that there is age independent variable, sex just to show how to keep other variables for grouping in by statement later, variable that has the name of the original variable and value which has the value of each variable. There is one row in the output for each variable in array of variables of interest. The sorted and a regression run for the combinations of Sex and Variable.
I am not going to say this is a good model, just a way to build code.
Hi there - thanks for the helpful suggestion to run multiple dependent variable linear regressions. I use the following code and getting estimates but I just want to get the intercept values and not all those extra information. please can you help me on this?
proc reg noprint data=Myfiles.dset1 outest=myfiles.regout1 edf ;
model F1--F2 = EPU;
run;
I'm afraid that SAS is going to give you all that information in the output data set. But if you just need the intercept, you can simply ignore the rest of the variables, or even drop the columns you don't need.
If the column names are f1 to f600, the order doesn't matter. F1-F600 will work, as long as every one of the variables F1 through F600 are present in the data set.
If they are not named consecutively, and they have diverse names like first column and last_column, then something like this:
First_column -- last_column
But ...
why are you making us guess at the variable names? Can you just give us an example of the pattern, or of the first 10 actual variable names?
Now I'm skeptical of the entire procedure. Running 600 regressions doesn't really seem like a good idea to me, what are you going to do once you have all of them? It's hard for me to imagine how these 600 regressions will be useful.
Hello Paige,
Actually I am doing univariate portfolio analysis where I will sort equities returns based on their beta coefficient. I have 36 months rolling window and each month from Jan93 untill Dec19 I need the beta coefficient for each equity. Each month have different number of equities like for Jan93 there are 256 equities (in Feb93 its 255 equities and so on) and need beta for each equity so that I can sort the equities based on their betas to construct decile portfolios. I have to repeat this for all 361 months.
I have already created all 361 sheets for 36 month rolling window and each sheet have different equities and this is the reason I have different column name.
In total I have 5000 equities so F1 F2 --- F5000 but based on 36 months return rolling window, every month have different number of equities that fulfills the 36 months return requirement.
Do I need to do this rolling window thing in SAS to make this process much easier?
I am bit stuck here and appreciate if you you can help me on this.
Regards,
Sara
If you search the internet, people have posted actual SAS code to produce a "rolling-window regression" in SAS. There's a macro to do this, and also a version that doesn't involve macros. I don't think they account for the possibility of different number of variables/equities in each month.
It may be (as I think about it now) that you are going to need a rather sophisticated macro to handle the different number of variables/equities each month; or it may be that if you have all 5000 variables each month, with some of them containing all missing values for a given month would work, I don't know, you'd have to try it; or maybe there's easier solutions that I haven't thought of.
In any event, it seems like you are going to need to do some fairly sophisticated programming.
Thanks for the helpful suggestion.
Regards,
Sara
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.