Hello, I am working on my Master's thesis and have hit something of a snag. I want to run a very simple probit regression over a wide range of columns (several hundred), where each would be using the same model and same dependent variable. I am working with real time data, so each column represents the same series estimated at a different date. I essentially want to test how well every single date performs when used in a very simple probit regression. The columns are all in the same dataset etc, but I'm not sure how to efficiently automate the process, as doing it by hand would be extremely time consuming.
So my question is how would I create a macro that iterates over this worksheet and runs the probit regression on every single variable/column in it in a given range? If a macro is not an efficient way to do this, what would be? I understand the basics of writing macros, of SQL, of iteration etc but I'm not sure how to put them all together to get what I need. Furthermore, how would I accomplish this if I wanted to do the same regression but using 2-3 variables at a time?
Any help with this dilemma would be greatly appreciated!
@vonkraush wrote:
Hello, I am working on my Master's thesis and have hit something of a snag. I want to run a very simple probit regression over a wide range of columns (several hundred), where each would be using the same model and same dependent variable. I am working with real time data, so each column represents the same series estimated at a different date. I essentially want to test how well every single date performs when used in a very simple probit regression. The columns are all in the same dataset etc, but I'm not sure how to efficiently automate the process, as doing it by hand would be extremely time consuming.
So my question is how would I create a macro that iterates over this worksheet and runs the probit regression on every single variable/column in it in a given range? If a macro is not an efficient way to do this, what would be? I understand the basics of writing macros, of SQL, of iteration etc but I'm not sure how to put them all together to get what I need. Furthermore, how would I accomplish this if I wanted to do the same regression but using 2-3 variables at a time?
Any help with this dilemma would be greatly appreciated!
The answer to your first question is to transpose and use BY groups in your regression instead.
The answer to your second is to create a macro and then call it as desired. There was a question on here earlier this week about creating all possible 2/3 pairs of combinations from a list of variables. (CALL ALLCOMB/LEXCOMB + CALL EXECUTE)
http://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/
@vonkraush wrote:
Hello, I am working on my Master's thesis and have hit something of a snag. I want to run a very simple probit regression over a wide range of columns (several hundred), where each would be using the same model and same dependent variable. I am working with real time data, so each column represents the same series estimated at a different date. I essentially want to test how well every single date performs when used in a very simple probit regression. The columns are all in the same dataset etc, but I'm not sure how to efficiently automate the process, as doing it by hand would be extremely time consuming.
So my question is how would I create a macro that iterates over this worksheet and runs the probit regression on every single variable/column in it in a given range? If a macro is not an efficient way to do this, what would be? I understand the basics of writing macros, of SQL, of iteration etc but I'm not sure how to put them all together to get what I need. Furthermore, how would I accomplish this if I wanted to do the same regression but using 2-3 variables at a time?
Any help with this dilemma would be greatly appreciated!
The answer to your first question is to transpose and use BY groups in your regression instead.
The answer to your second is to create a macro and then call it as desired. There was a question on here earlier this week about creating all possible 2/3 pairs of combinations from a list of variables. (CALL ALLCOMB/LEXCOMB + CALL EXECUTE)
http://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/
Thanks for the follow up. I am trying to use the By procedure like you said but I'm a little confused by how it works, I can't find a great example of sample code anywhere. So to make it work properly, what exactly do I need.
In the example provided, you include the following line of code:
PROC GLM DATA=sample;
BY Y_Index;
model y= x1 x2 x3;
run;
Sample is obviously just the dataset you want to run the regression over, but I need some clarification as to the nature of Y_Index. Is it just a dataset with one row ('y') which contains the name of every dependent variable I want to run the regression over?
ALSO: in this regression the dependent variable is constant, it's the explanatory variables that change. Could I still use BY in the same manner, but with an index of the X variables instead, something like:
PROC GLM DATA=sample;
BY X_Index;
model y= x;
run;
I'm not following your question.
The link in the first link 🙂 has a worked example, they included their data as a question and my code has the rest of the solution so you could work through the exercise if desired.
Sorry for not explaining myself clearly! I went over that example and understand a few things better, but am still confused on some points. I attempted to go through the excercise by myself to get a feel for 'BY', but ran into problems early on. Mainly at this step:
*Create Returns and Squared Returns; data data2;set data; vars(*) a1--a50; array r(50); do i=1 to dim(vars); end; drop i; var(*) r1--r50; array rsq(50); do i=1 to dim(var); end; drop i; run;
vars(*) and var(*) returned multiple errors,mostly relating to 'undeclared array errors'. I tried toying around with this by explicitly making them arrays but the end result was still just 50 r and 50rsq variables which were comprised entirely of blank values, which based on later steps doesn't seem correct either. What was the intent of this step, and why wouldn't it be working properly for me?
In case it is relevent: I am doing most of this using SAS studio, I have access to normal SAS at my university, but in order to test and develop code I've mostly been working with studio.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.