BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Reeza
Super User
Each variable takes the value of 0 or 1.

allmdummy: a dummy variable that takes the value of 0 or 1.

Are these two true?

If so, you're looking for combinations of (0, 1, 0/1) versus 0,1, 0/1) to make all your combinations?
finans_sas
Quartz | Level 8

Thank you @Reeza for this question. Your statement is correct. The combinations will 0, 1, and 0&1 together versus 0, 1, and 0&1 together. I have 12 variables (corvar1 through corvar12), and each corvar takes the values of 0 and 1. And the other dummy, allmdummy, takes the values of 0 and 1 too. I want to be able to run the same regression for every single combination, which amounts to 108. All the possible combinations for the first variable will look like the following:

corvar1=0 and allmdummy=0,  

corvar1=0 and  allmdummy=1

corvar1=0 and allmdummy= 0 or 1

 

corvar1=1 and allmdummy=0,

corvar1=1 and allmdummy=1

corvar1=1 and allmdummy= 0 or 1

 

corvar1=0,1, allmdummy=0,

corvar1=0,1 allmdummy=1

corvar1=0,1 allmdummy= 0 or 1

 

Ideally, I will be able to create a unique sampleid for each combination and then run proc reg and proc means command for each combination (sampleid). I hope this helps. Please let me know if you have any follow-up questions.

Thank you so much!

finans_sas
Quartz | Level 8
Quick clarification: by "0 or 1" and "0,1" I meant that we include all the values where the variable is either 0 or 1, which by definition is all the observations for a given dummy variable.
PaigeMiller
Diamond | Level 26

@finans_sas wrote:

Thank you @Reeza for this question. Your statement is correct. The combinations will 0, 1, and 0&1 together versus 0, 1, and 0&1 together. I have 12 variables (corvar1 through corvar12), and each corvar takes the values of 0 and 1. And the other dummy, allmdummy, takes the values of 0 and 1 too. I want to be able to run the same regression for every single combination, which amounts to 108. All the possible combinations for the first variable will look like the following:

corvar1=0 and allmdummy=0,  

corvar1=0 and  allmdummy=1

corvar1=0 and allmdummy= 0 or 1

 

corvar1=1 and allmdummy=0,

corvar1=1 and allmdummy=1

corvar1=1 and allmdummy= 0 or 1

 

corvar1=0,1, allmdummy=0,

corvar1=0,1 allmdummy=1

corvar1=0,1 allmdummy= 0 or 1

 

Ideally, I will be able to create a unique sampleid for each combination and then run proc reg and proc means command for each combination (sampleid). I hope this helps. Please let me know if you have any follow-up questions.

Thank you so much!


This is a trivial change to the solution I posted earlier.

--
Paige Miller
Reeza
Super User

You're approaching inefficiently in my view.

You should transpose your data such that all of the variables you want to filter by are one variable with another variable holding the variable name. 

Then I think you can run two or three regressions using a BY statement instead of macros or loops anywhere. 

 

Your data should be something like this, with the transpose replicating your data for you. The overlaps are handled with the different multiple regressions.

 

VariableName VariableValue AllMDummy ........
ColVar1   0  0
ColVar2 1 0
...

Then run:

proc reg data=transposed;
by variableName VariableValue;
.....;
run;

proc reg data=transposed;
by variableName VariableValue ALLMDummy;
.....;
run;

proc reg data=originalData;
*NO BY statement, your 0/1 * 0/1 results in all included but use originalData;
run;

@finans_sas wrote:

Thank you @Reeza for this question. Your statement is correct. The combinations will 0, 1, and 0&1 together versus 0, 1, and 0&1 together. I have 12 variables (corvar1 through corvar12), and each corvar takes the values of 0 and 1. And the other dummy, allmdummy, takes the values of 0 and 1 too. I want to be able to run the same regression for every single combination, which amounts to 108. All the possible combinations for the first variable will look like the following:

corvar1=0 and allmdummy=0,  

corvar1=0 and  allmdummy=1

corvar1=0 and allmdummy= 0 or 1

 

corvar1=1 and allmdummy=0,

corvar1=1 and allmdummy=1

corvar1=1 and allmdummy= 0 or 1

 

corvar1=0,1, allmdummy=0,

corvar1=0,1 allmdummy=1

corvar1=0,1 allmdummy= 0 or 1

 

Ideally, I will be able to create a unique sampleid for each combination and then run proc reg and proc means command for each combination (sampleid). I hope this helps. Please let me know if you have any follow-up questions.

Thank you so much!


 

finans_sas
Quartz | Level 8

Thank you so much for sharing this other perspective, @Reeza . I need to think about how it will work with the data. The starting dataset has around 50 million rows of data (another one that I need to apply the same code to has about 250 million). 

Quentin
Super User

@finans_sas wrote:

Thank you so much for sharing this other perspective, @Reeza . I need to think about how it will work with the data. The starting dataset has around 50 million rows of data (another one that I need to apply the same code to has about 250 million). 


That's an exciting twist to the story...  If you take one of the approaches where you expand the data to have one record per SampleID, with 108 different sampleIDs, do you have a sense of how much that will increase the number of rows? Each record can't be in all 108 samples, so it wouldn't be 50M*108, but could it be 50M*20?

 

That's getting big.  I don't work with big data much, but I'd be inclined to try Tom's approach of just running 108 PROC REG steps (one for each subset), rather than expanding the data into billions of rows.  And hopefully you've got enough memory...

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
PaigeMiller
Diamond | Level 26

@finans_sas wrote:

Thank you so much for sharing this other perspective, @Reeza . I need to think about how it will work with the data. The starting dataset has around 50 million rows of data (another one that I need to apply the same code to has about 250 million). 


Hello, @finans_sas

 

When important pieces of information are not included in your problem statement, and then revealed to us later, this leads to a frustrating process where code presented prior to these disclosures may not be best. So far, you have not told us in the original problem statement that you are doing this "the results are robust to different ways of measuring the variable of interest"; that the values of the variables in the code are 0,1 variables and not continuous variables; and that 50 million records are in the data set. It would be better to state this in the first post on this problem. In addition, I want to know what other important pieces of information haven't yet been mentioned.

 

With regards to "the results are robust to different ways of measuring the variable of interest":

 

There are bootstrap and jackknife methods available in SAS which can perhaps achieve this without running all possible combinations of regressions here. There are also methods where you random sample, instead of trying every possible combination of everything.

 

In any event, I still think my solution which passes through the data set once, rather than passes through the data set 108 times is the way to go, and I feel even more convinced of this with the revelation that you have 50 million pieces of data in the data set. Any solution that uses macros and looping will be orders of magnitude slower.

--
Paige Miller
finans_sas
Quartz | Level 8

You are absolutely right, @PaigeMiller.  Next time If I have another question, I will make sure to include all the important details. I sincerely apologize for this. Thank you so much for all your help and the code that your proposed. It worked like a charm.

Quentin
Super User

Just curious, when you expand your 50M row dataset, how many records do you end up with?

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
finans_sas
Quartz | Level 8

Hi @Quentin ! I actually ended up running the code three times (4 corvars at a time) since the computer kept running out of disk space. If I recall correctly, it was 20 billion or so for each run.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 25 replies
  • 1488 views
  • 24 likes
  • 5 in conversation