BookmarkSubscribeRSS Feed
SquashingOtters
Fluorite | Level 6

Hello everyone, I'm a freshly new Sas User, I need to generate various dataset of N sample numerosity with every possible combination of the value of X and Y.

Y is a binary variable (0, 1) and X can vary (it can be binary or -1,0,1 or -2,-1,0,1,2 etc, but always with integers number).

I need to simulate every possible dataset which has N numerosity and every observation of the dataset has an X and a Y value.

So if I have N=5 X=(0, 1) Y=(0, 1) my datasets will look like:

 

X Y        X Y        X Y      X Y

0 0         0 0        0 0       0 0

0 0         1 0        1 0       1 0

1 0         1 0        0 1       0 1

0 1         0 1        0 1       1 1

1 1         1 1        1 1       1 1

since order doesn't count.

 

P.S. it's not strictly necessary that the result are in different dataset, and  the datasets must have ALL the combination of X and Y  (in this case 1 1,  1 0,  0 1,  0 0)

I had tried to find a function but i can't find anything...

Thanks Everyone

11 REPLIES 11
SquashingOtters
Fluorite | Level 6

Hello everyone, I'm a freshly new Sas User, I need to generate various dataset of N sample numerosity with every possible combination of the value of X and Y.

Y is a binary variable (0, 1) and X can vary (it can be binary or -1,0,1 or -2,-1,0,1,2 etc, but always with integers number).

I need to simulate every possible dataset which has N numerosity and every observation of the dataset has an X and a Y value.

So if I have N=5 X=(0, 1) Y=(0, 1) my datasets will look like:

X   Y  ID                 X   Y  ID                 X   Y   ID          

1   0   1                  0    0   1                 1    1   1

1   1   2                  0    1   2                 1    0   2

1   1   3                  1    0   3                 0    0   3

0   1   4                  1    1   4                 1    1   4

0   0   5                  0    1   5                 0    1   5     etc.. etc..

 

And so on with every possibly combination

P.S. it's not strictly necessary that the result are in different dataset, and it's better if there are only the datasets with ALL the combination of X and Y  (in this case 1 1,  1 0,  0 1,  0 0)

I had tried to find a function but i can't find anything...

Thanks Everyone

Reeza
Super User

You can use an SQL cross join here or a do loop. 

 

In the cross join, first make tables that have the list of values for each variable and then do a select *.

 

data a;
input X;
cards;
0
1
;
data b;
do y=-1 to 1;
output;
end;
run;

proc sql;
create table allcomb as
select *
from a, b;
quit;

Or use nested do loops to get all combinations:

 

data want_option2;
do x=0 to 1;
	do y= -1 to 1;
		output;
	end;
end;
run;

You haven't really explained where N sample comes inthough, so I guess I can help with that after you've clarified it.

 


@SquashingOtters wrote:

Hello everyone, I'm a freshly new Sas User, I have to study the convergence of the log-binomial model, however I need to generate various dataset of N numerosity with every possible combination of the value of X and Y.

Y is a binary variable (0, 1) and X can vary (it can be binary or -1,0,1 or -2,-1,0,1,2 etc, but always with integers number) and I need to generate every possible dataset with N observations and combinations between X and Y.

EXAMPLE:

If I want to have N=4 Obs and X and Y can be only 0 or 1, I could have only the following dataset:

0 0      0 0     0 0     0 0    1 0     1 0     1 0     1 0     0 1     0 1    0 1    0 1     1 1     1 1     1 1     1 1

0 0      0 1     1 0     1 1    0 0     0 1     1 0     1 1     0 0     0 1    1 0    1 1     0 0     0 1     1 0     1 1

I had tried for a week but I can't find a function that could help me.

P.S. it's not strictly necessary that the result are in different dataset.

Thanks a lot and sorry for the bad English, I hope that my message is clear.


 

SquashingOtters
Fluorite | Level 6

Hi, I wrong explain my problems in first place, sorry...

the N value is the number of the sample size that I want, for example, if I want to simulate a case-control study, N is the number of the patient, and the combinations of X and Y are all the possible conditions of the patient (X=disease(1 yes, 0 no), Y=exposure(1 yes, 0 no)).

What i need is every dataset of N numerosity that contain all the possible case of the data (the order doesn't count), and all the possible combination of X and Y (1 1, 1 0, 0 1, 0 0)

ballardw
Super User

If I understand what you want I think this may be the easiest way to go.

 

data example;
   do n=1 to 4;
      do y=0,1;
         do x= 0,1;
            output;
         end;
      end;
   end;
run;

The key to this is that you can specify individual values on a do loop.

 

If you want your x to be -1, -0.5, 0, 0.5 and 1 then place then in a comma delimited list on the DO X= loop control.

 

I think that you should keep the N value around as in my example data set.

SquashingOtters
Fluorite | Level 6

Hi, I explain wrong my problems in first place, sorry... the N value is the number of the sample size that I want, for example, if I want to simulate a case-control study, N is the number of the patient, and the combinations of X and Y are all the possible conditions of the patient (X=disease(1 yes, 0 no), Y=exposure(1 yes, 0 no)). What i need is every dataset of N numerosity that contain all the possible case of the data (the order doesn't count), and all the possible combination of X and Y (1 1, 1 0, 0 1, 0 0)

PGStats
Opal | Level 21

If I understood, you need the help of allcombi routine. For example, for x = (-1, 0, 1), y = (0, 1), N = 4 :

 

data values;
do y = 0, 1;
    do x = -1, 0, 1;
        i + 1;
        output;
        end;
    end;
run;

%let N=4;

data comb;
if 0 then set values nobs=M;
array a{&N};
do k = 1 to comb(M, &N);
    call allcombi(M, &N, of a{*});
    do j = 1 to dim(a);
        i = a{j};
        output;
        end;
    end;
stop;
keep k i;
run;

proc sql;
create table combVal as
select
    k, x, y
from comb inner join values on comb.i=values.i
order by k, x, y;
quit;

proc print data=combVal noobs; run;
PG
PGStats
Opal | Level 21

So, given your example N=5 X=(0, 1) Y=(0, 1), how many datasets do you expect to get?

PG
SquashingOtters
Fluorite | Level 6

If i have N=5 X=(0, 1) Y=(0, 1) I expect to have 4 unique datasets with every combination of X and Y:

X Y        X Y        X Y      X Y

0 0         0 0        0 0       0 0

0 0         1 0        1 0       1 0

1 0         1 0        0 1       0 1

0 1         0 1        0 1       1 1

1 1         1 1        1 1       1 1

since order doesn't count.

I know the number of different unique dataset by this formula:

N=given numerosity of the sample

K=number of combination of X and Y (0 0, 0 1, 1 0, 1 1 in this case)

(n-1)!/[(k-1)!(n-k)!]

 

Reeza
Super User
Do none of the answers above answer that? I feel like they do at the moment. If not please explain in detail how they do not.
SquashingOtters
Fluorite | Level 6

Unfortunately no... 

Allcombi give me an error when I use a N>k where k are the number of possible combinations between X and Y, that's cause it computes the number of combinations of K elements taken N at a time.

While the Do loop of ballardw and your code give all the possible combinations between X and Y, but what I need is every possible dataset that contains not only all the combination of X and Y, but also simulate other data (which are other combination of X and Y) for a given N sample size.

Basically every dataset/tables is unique and should contain at least once every combinations of X and Y plus other data that can assume the combination of X and Y.

For example if i have N=5 X=(0, 1) Y=(0, 1) the dataset that I should have are:

X Y        X Y        X Y      X Y

0 0         0 0        0 0       0 0

0 0         1 0        1 0       1 0

1 0         1 0        0 1       0 1

0 1         0 1        0 1       1 1

1 1         1 1        1 1       1 1

 

The order doesn't count so all that matters are the frequencies of the data.

 

Other dataset like:

X Y

0 0

0 0

0 0

1 0

0 1    have N sample size, but it doesn't contain all the possible combination of X and Y

I hope that I had explained better...

Thank you for the interesting

Reeza
Super User
Did you check your post from yesterday? You have not replied or marked it as solved and this question seems identical.
https://communities.sas.com/t5/New-SAS-User/Generate-every-possible-dataset-of-N-sample-for-every-po...

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3814 views
  • 2 likes
  • 4 in conversation