I have a number of independent(30) and dependent variables(10) which I would like to perform linear regression on. In addition, I may like to lag the independent variables and perform regression between all the independent variables and dependent variables.
indvar 1,indvar 2, indvar 3......indvar 30
depvar 1,depvar 2,depvar 3.....depvar 10
I would want to perform regression between indvar 1 against depvar 1 to depvar 10. (ie. there would be at least 300 regressions being performed.)
I would then like to output their regression statistics automatically to a dataset or excel sheet for further analysis.
Other than writing and repeating the code , is there any smart way to do this in an automatic manner whereby I could then examine the output statistics offline on another day?
Assuming you have obeyed Zender's law 1 of macro writing and started with tested and working SAS code, then you could run the process in a macro where you passed a list of variables and defined an output destination for each.
However, things are not always as simple as they seem, and my limited grasp of advanced statistics is enough to tell me that there is a problem with the whole process as you describe it. It is reminiscent of the suspect in the basement of the Lubyanka who is being slapped across the face with a wet cloth and urged to confess to each crime in a long list. At some point, a confession will be made, but how can you know the confession is true after all this torture?
All you seem to have defined here is a series of questions, each of which will be tested and evaluated against a p-value. But if your p-value is 0.05, then you are saying that there is better than 1 chance in 20 that the result is meaningful. You, however, throw 300 such tests against it, and must expect that you are likely to have around 15 significant results just by statistical probability of the p-value. Tell me, which significant result is significant because it is significant, and which is significant because of the Lubyanka effect [patent pending]?
Isn't this a classic example of how to get type 1 errors? Errors that are indicative of a false positive? My statistically much wiser friend refers to this scenario as controlling the experiment-wide error rate. I suggest that before you proceed down the macro route and give yourself a set of wallpaper results, that you speak with someone who is far more knowledgeable in designing and executing experiments of this kind.
I don't mean to be unkind, but this is a question for deep and meaningful statistician speak, or perhaps for SAS Tech Support who can point you to other Procedures that handle multiple regressions, but it isn't for this forum.
David makes a good point about multiple comparisons. In a frequentist view of the world, this can lead to many false positives if you adhere to the cut-point approach. P-values can also be interpreted as a metric on the strength of evidence. Suffice it to say, use caution with lots of p-values and little data.
If you are looking just for p-values, and don't also need the regression coefficients, the correlation task provides the same p-values as univariable regression. It has different assumptions for interpretation of the statistic, but the p-values are the same.