turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Forecasting
- /
- How to iterate procedures (probit)

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2017 09:29 PM

Hello, I am working on my Master's thesis and have hit something of a snag. I want to run a very simple probit regression over a wide range of columns (several hundred), where each would be using the same model and same dependent variable. I am working with real time data, so each column represents the same series estimated at a different date. I essentially want to test how well every single date performs when used in a very simple probit regression. The columns are all in the same dataset etc, but I'm not sure how to efficiently automate the process, as doing it by hand would be extremely time consuming.

So my question is how would I create a macro that iterates over this worksheet and runs the probit regression on every single variable/column in it in a given range? If a macro is not an efficient way to do this, what would be? I understand the basics of writing macros, of SQL, of iteration etc but I'm not sure how to put them all together to get what I need. Furthermore, how would I accomplish this if I wanted to do the same regression but using 2-3 variables at a time?

Any help with this dilemma would be greatly appreciated!

Accepted Solutions

Solution

03-29-2017
10:53 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to vonkraush

03-23-2017 10:49 PM

vonkraush wrote:

Hello, I am working on my Master's thesis and have hit something of a snag. I want to run a very simple probit regression over a wide range of columns (several hundred), where each would be using the same model and same dependent variable. I am working with real time data, so each column represents the same series estimated at a different date. I essentially want to test how well every single date performs when used in a very simple probit regression. The columns are all in the same dataset etc, but I'm not sure how to efficiently automate the process, as doing it by hand would be extremely time consuming.

So my question is how would I create a macro that iterates over this worksheet and runs the probit regression on every single variable/column in it in a given range? If a macro is not an efficient way to do this, what would be? I understand the basics of writing macros, of SQL, of iteration etc but I'm not sure how to put them all together to get what I need. Furthermore, how would I accomplish this if I wanted to do the same regression but using 2-3 variables at a time?

Any help with this dilemma would be greatly appreciated!

The answer to your first question is to transpose and use BY groups in your regression instead.

The answer to your second is to create a macro and then call it as desired. There was a question on here earlier this week about creating all possible 2/3 pairs of combinations from a list of variables. (CALL ALLCOMB/LEXCOMB + CALL EXECUTE)

http://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/

All Replies

Solution

03-29-2017
10:53 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to vonkraush

03-23-2017 10:49 PM

vonkraush wrote:

Hello, I am working on my Master's thesis and have hit something of a snag. I want to run a very simple probit regression over a wide range of columns (several hundred), where each would be using the same model and same dependent variable. I am working with real time data, so each column represents the same series estimated at a different date. I essentially want to test how well every single date performs when used in a very simple probit regression. The columns are all in the same dataset etc, but I'm not sure how to efficiently automate the process, as doing it by hand would be extremely time consuming.

So my question is how would I create a macro that iterates over this worksheet and runs the probit regression on every single variable/column in it in a given range? If a macro is not an efficient way to do this, what would be? I understand the basics of writing macros, of SQL, of iteration etc but I'm not sure how to put them all together to get what I need. Furthermore, how would I accomplish this if I wanted to do the same regression but using 2-3 variables at a time?

Any help with this dilemma would be greatly appreciated!

The answer to your first question is to transpose and use BY groups in your regression instead.

The answer to your second is to create a macro and then call it as desired. There was a question on here earlier this week about creating all possible 2/3 pairs of combinations from a list of variables. (CALL ALLCOMB/LEXCOMB + CALL EXECUTE)

http://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

03-24-2017 01:58 PM

Thanks for the follow up. I am trying to use the By procedure like you said but I'm a little confused by how it works, I can't find a great example of sample code anywhere. So to make it work properly, what exactly do I need.

In the example provided, you include the following line of code:

```
PROC GLM DATA=sample;
BY Y_Index;
model y= x1 x2 x3;
run;
```

Sample is obviously just the dataset you want to run the regression over, but I need some clarification as to the nature of Y_Index. Is it just a dataset with one row ('y') which contains the name of every dependent variable I want to run the regression over?

ALSO: in this regression the dependent variable is constant, it's the explanatory variables that change. Could I still use BY in the same manner, but with an index of the X variables instead, something like:

```
PROC GLM DATA=sample;
BY X_Index;
model y= x;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to vonkraush

03-24-2017 02:42 PM

I'm not following your question.

The link in the first link has a worked example, they included their data as a question and my code has the rest of the solution so you could work through the exercise if desired.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

03-24-2017 07:23 PM

Sorry for not explaining myself clearly! I went over that example and understand a few things better, but am still confused on some points. I attempted to go through the excercise by myself to get a feel for 'BY', but ran into problems early on. Mainly at this step:

*Create Returns and Squared Returns; data data2;set data; vars(*) a1--a50; array r(50); do i=1 to dim(vars); end; drop i; var(*) r1--r50; array rsq(50); do i=1 to dim(var); end; drop i; run;

vars(*) and var(*) returned multiple errors,mostly relating to 'undeclared array errors'. I tried toying around with this by explicitly making them arrays but the end result was still just 50 r and 50rsq variables which were comprised entirely of blank values, which based on later steps doesn't seem correct either. What was the intent of this step, and why wouldn't it be working properly for me?

In case it is relevent: I am doing most of this using SAS studio, I have access to normal SAS at my university, but in order to test and develop code I've mostly been working with studio.