Solved: Re: SAS Macro & Nested Do Loop to subset data - Page 2

Reeza · Posted 09-20-2023 03:10 PM

Each variable takes the value of 0 or 1.

allmdummy: a dummy variable that takes the value of 0 or 1.

Are these two true?

If so, you're looking for combinations of (0, 1, 0/1) versus 0,1, 0/1) to make all your combinations?

finans_sas · Posted 09-20-2023 04:23 PM

Thank you @Reeza for this question. Your statement is correct. The combinations will 0, 1, and 0&1 together versus 0, 1, and 0&1 together. I have 12 variables (corvar1 through corvar12), and each corvar takes the values of 0 and 1. And the other dummy, allmdummy, takes the values of 0 and 1 too. I want to be able to run the same regression for every single combination, which amounts to 108. All the possible combinations for the first variable will look like the following:

corvar1=0 and allmdummy=0,

corvar1=0 and allmdummy=1

corvar1=0 and allmdummy= 0 or 1

corvar1=1 and allmdummy=0,

corvar1=1 and allmdummy=1

corvar1=1 and allmdummy= 0 or 1

corvar1=0,1, allmdummy=0,

corvar1=0,1 allmdummy=1

corvar1=0,1 allmdummy= 0 or 1

Ideally, I will be able to create a unique sampleid for each combination and then run proc reg and proc means command for each combination (sampleid). I hope this helps. Please let me know if you have any follow-up questions.

Thank you so much!

finans_sas · Posted 09-20-2023 04:26 PM

Quick clarification: by "0 or 1" and "0,1" I meant that we include all the values where the variable is either 0 or 1, which by definition is all the observations for a given dummy variable.

PaigeMiller · Posted 09-20-2023 05:16 PM

@finans_sas wrote:

Thank you @Reeza for this question. Your statement is correct. The combinations will 0, 1, and 0&1 together versus 0, 1, and 0&1 together. I have 12 variables (corvar1 through corvar12), and each corvar takes the values of 0 and 1. And the other dummy, allmdummy, takes the values of 0 and 1 too. I want to be able to run the same regression for every single combination, which amounts to 108. All the possible combinations for the first variable will look like the following:

corvar1=0 and allmdummy=0,

corvar1=0 and allmdummy=1

corvar1=0 and allmdummy= 0 or 1

corvar1=1 and allmdummy=0,

corvar1=1 and allmdummy=1

corvar1=1 and allmdummy= 0 or 1

corvar1=0,1, allmdummy=0,

corvar1=0,1 allmdummy=1

corvar1=0,1 allmdummy= 0 or 1

Ideally, I will be able to create a unique sampleid for each combination and then run proc reg and proc means command for each combination (sampleid). I hope this helps. Please let me know if you have any follow-up questions.

Thank you so much!

This is a trivial change to the solution I posted earlier.

--
Paige Miller

Reeza · Posted 09-20-2023 05:25 PM

You're approaching inefficiently in my view.

You should transpose your data such that all of the variables you want to filter by are one variable with another variable holding the variable name.

Then I think you can run two or three regressions using a BY statement instead of macros or loops anywhere.

Your data should be something like this, with the transpose replicating your data for you. The overlaps are handled with the different multiple regressions.

VariableName VariableValue AllMDummy ........
ColVar1   0  0
ColVar2 1 0
...

Then run:

proc reg data=transposed;
by variableName VariableValue;
.....;
run;

proc reg data=transposed;
by variableName VariableValue ALLMDummy;
.....;
run;

proc reg data=originalData;
*NO BY statement, your 0/1 * 0/1 results in all included but use originalData;
run;

@finans_sas wrote:

Thank you @Reeza for this question. Your statement is correct. The combinations will 0, 1, and 0&1 together versus 0, 1, and 0&1 together. I have 12 variables (corvar1 through corvar12), and each corvar takes the values of 0 and 1. And the other dummy, allmdummy, takes the values of 0 and 1 too. I want to be able to run the same regression for every single combination, which amounts to 108. All the possible combinations for the first variable will look like the following:

corvar1=0 and allmdummy=0,

corvar1=0 and allmdummy=1

corvar1=0 and allmdummy= 0 or 1

corvar1=1 and allmdummy=0,

corvar1=1 and allmdummy=1

corvar1=1 and allmdummy= 0 or 1

corvar1=0,1, allmdummy=0,

corvar1=0,1 allmdummy=1

corvar1=0,1 allmdummy= 0 or 1

Ideally, I will be able to create a unique sampleid for each combination and then run proc reg and proc means command for each combination (sampleid). I hope this helps. Please let me know if you have any follow-up questions.

Thank you so much!

finans_sas · Posted 09-20-2023 08:15 PM

Thank you so much for sharing this other perspective, @Reeza . I need to think about how it will work with the data. The starting dataset has around 50 million rows of data (another one that I need to apply the same code to has about 250 million).

Quentin · Posted 09-20-2023 09:35 PM

@finans_sas wrote:

Thank you so much for sharing this other perspective, @Reeza . I need to think about how it will work with the data. The starting dataset has around 50 million rows of data (another one that I need to apply the same code to has about 250 million).

That's an exciting twist to the story... If you take one of the approaches where you expand the data to have one record per SampleID, with 108 different sampleIDs, do you have a sense of how much that will increase the number of rows? Each record can't be in all 108 samples, so it wouldn't be 50M*108, but could it be 50M*20?

That's getting big. I don't work with big data much, but I'd be inclined to try Tom's approach of just running 108 PROC REG steps (one for each subset), rather than expanding the data into billions of rows. And hopefully you've got enough memory...

PaigeMiller · Posted 09-21-2023 05:40 AM

@finans_sas wrote:

Thank you so much for sharing this other perspective, @Reeza . I need to think about how it will work with the data. The starting dataset has around 50 million rows of data (another one that I need to apply the same code to has about 250 million).

Hello, @finans_sas

When important pieces of information are not included in your problem statement, and then revealed to us later, this leads to a frustrating process where code presented prior to these disclosures may not be best. So far, you have not told us in the original problem statement that you are doing this "the results are robust to different ways of measuring the variable of interest"; that the values of the variables in the code are 0,1 variables and not continuous variables; and that 50 million records are in the data set. It would be better to state this in the first post on this problem. In addition, I want to know what other important pieces of information haven't yet been mentioned.

With regards to "the results are robust to different ways of measuring the variable of interest":

There are bootstrap and jackknife methods available in SAS which can perhaps achieve this without running all possible combinations of regressions here. There are also methods where you random sample, instead of trying every possible combination of everything.

In any event, I still think my solution which passes through the data set once, rather than passes through the data set 108 times is the way to go, and I feel even more convinced of this with the revelation that you have 50 million pieces of data in the data set. Any solution that uses macros and looping will be orders of magnitude slower.

--
Paige Miller

finans_sas · Posted 09-21-2023 04:19 PM

You are absolutely right, @PaigeMiller. Next time If I have another question, I will make sure to include all the important details. I sincerely apologize for this. Thank you so much for all your help and the code that your proposed. It worked like a charm.

Quentin · Posted 09-21-2023 04:27 PM

Just curious, when you expand your 50M row dataset, how many records do you end up with?

finans_sas · Posted 09-21-2023 04:42 PM

Hi @Quentin ! I actually ended up running the code three times (4 corvars at a time) since the computer kept running out of disk space. If I recall correctly, it was 20 billion or so for each run.

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away