turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- BI
- /
- Enterprise Guide
- /
- Performing regressions for multiple variables

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-01-2007 04:20 AM

Hi ,

I have a number of independent(30) and dependent variables(10) which I would like to perform linear regression on. In addition, I may like to lag the independent variables and perform regression between all the independent variables and dependent variables.

For example,

indvar 1,indvar 2, indvar 3......indvar 30

depvar 1,depvar 2,depvar 3.....depvar 10

I would want to perform regression between indvar 1 against depvar 1 to depvar 10. (ie. there would be at least 300 regressions being performed.)

I would then like to output their regression statistics automatically to a dataset or excel sheet for further analysis.

Other than writing and repeating the code , is there any smart way to do this in an automatic manner whereby I could then examine the output statistics offline on another day?

I have a number of independent(30) and dependent variables(10) which I would like to perform linear regression on. In addition, I may like to lag the independent variables and perform regression between all the independent variables and dependent variables.

For example,

indvar 1,indvar 2, indvar 3......indvar 30

depvar 1,depvar 2,depvar 3.....depvar 10

I would want to perform regression between indvar 1 against depvar 1 to depvar 10. (ie. there would be at least 300 regressions being performed.)

I would then like to output their regression statistics automatically to a dataset or excel sheet for further analysis.

Other than writing and repeating the code , is there any smart way to do this in an automatic manner whereby I could then examine the output statistics offline on another day?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-02-2007 05:32 PM

Assuming you have obeyed Zender's law 1 of macro writing and started with tested and working SAS code, then you could run the process in a macro where you passed a list of variables and defined an output destination for each.

However, things are not always as simple as they seem, and my limited grasp of advanced statistics is enough to tell me that there is a problem with the whole process as you describe it. It is reminiscent of the suspect in the basement of the Lubyanka who is being slapped across the face with a wet cloth and urged to confess to each crime in a long list. At some point, a confession will be made, but how can you know the confession is true after all this torture?

All you seem to have defined here is a series of questions, each of which will be tested and evaluated against a p-value. But if your p-value is 0.05, then you are saying that there is better than 1 chance in 20 that the result is meaningful. You, however, throw 300 such tests against it, and must expect that you are likely to have around 15 significant results just by statistical probability of the p-value. Tell me, which significant result is significant because it is significant, and which is significant because of the Lubyanka effect [patent pending]?

Isn't this a classic example of how to get type 1 errors? Errors that are indicative of a false positive? My statistically much wiser friend refers to this scenario as controlling the experiment-wide error rate. I suggest that before you proceed down the macro route and give yourself a set of wallpaper results, that you speak with someone who is far more knowledgeable in designing and executing experiments of this kind.

I don't mean to be unkind, but this is a question for deep and meaningful statistician speak, or perhaps for SAS Tech Support who can point you to other Procedures that handle multiple regressions, but it isn't for this forum.

Kind regards

David

However, things are not always as simple as they seem, and my limited grasp of advanced statistics is enough to tell me that there is a problem with the whole process as you describe it. It is reminiscent of the suspect in the basement of the Lubyanka who is being slapped across the face with a wet cloth and urged to confess to each crime in a long list. At some point, a confession will be made, but how can you know the confession is true after all this torture?

All you seem to have defined here is a series of questions, each of which will be tested and evaluated against a p-value. But if your p-value is 0.05, then you are saying that there is better than 1 chance in 20 that the result is meaningful. You, however, throw 300 such tests against it, and must expect that you are likely to have around 15 significant results just by statistical probability of the p-value. Tell me, which significant result is significant because it is significant, and which is significant because of the Lubyanka effect [patent pending]?

Isn't this a classic example of how to get type 1 errors? Errors that are indicative of a false positive? My statistically much wiser friend refers to this scenario as controlling the experiment-wide error rate. I suggest that before you proceed down the macro route and give yourself a set of wallpaper results, that you speak with someone who is far more knowledgeable in designing and executing experiments of this kind.

I don't mean to be unkind, but this is a question for deep and meaningful statistician speak, or perhaps for SAS Tech Support who can point you to other Procedures that handle multiple regressions, but it isn't for this forum.

Kind regards

David

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-04-2007 10:13 PM

David makes a good point about multiple comparisons. In a frequentist view of the world, this can lead to many false positives if you adhere to the cut-point approach. P-values can also be interpreted as a metric on the strength of evidence. Suffice it to say, use caution with lots of p-values and little data.

If you are looking just for p-values, and don't also need the regression coefficients, the correlation task provides the same p-values as univariable regression. It has different assumptions for interpretation of the statistic, but the p-values are the same.

Doc Muhlbaier

Duke

If you are looking just for p-values, and don't also need the regression coefficients, the correlation task provides the same p-values as univariable regression. It has different assumptions for interpretation of the statistic, but the p-values are the same.

Doc Muhlbaier

Duke

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-05-2007 03:21 PM

The answer to your questions is yes.

SAS can do multiple regressions against a set of variables and can output the parameter stats. One of the proc reg like procedures will step through all the combinations in search of a "best fit".

I would recommend that you read through the SAS/STATs documentation, since I haven't done this in a number of years.

SAS can do multiple regressions against a set of variables and can output the parameter stats. One of the proc reg like procedures will step through all the combinations in search of a "best fit".

I would recommend that you read through the SAS/STATs documentation, since I haven't done this in a number of years.