Hello everyone, I'm a freshly new Sas User, I need to generate various dataset of N sample numerosity with every possible combination of the value of X and Y.
Y is a binary variable (0, 1) and X can vary (it can be binary or -1,0,1 or -2,-1,0,1,2 etc, but always with integers number).
I need to simulate every possible dataset which has N numerosity and every observation of the dataset has an X and a Y value.
So if I have N=5 X=(0, 1) Y=(0, 1) my datasets will look like:
X Y X Y X Y X Y
0 0 0 0 0 0 0 0
0 0 1 0 1 0 1 0
1 0 1 0 0 1 0 1
0 1 0 1 0 1 1 1
1 1 1 1 1 1 1 1
since order doesn't count.
P.S. it's not strictly necessary that the result are in different dataset, and the datasets must have ALL the combination of X and Y (in this case 1 1, 1 0, 0 1, 0 0)
I had tried to find a function but i can't find anything...
Thanks Everyone
Hello everyone, I'm a freshly new Sas User, I need to generate various dataset of N sample numerosity with every possible combination of the value of X and Y.
Y is a binary variable (0, 1) and X can vary (it can be binary or -1,0,1 or -2,-1,0,1,2 etc, but always with integers number).
I need to simulate every possible dataset which has N numerosity and every observation of the dataset has an X and a Y value.
So if I have N=5 X=(0, 1) Y=(0, 1) my datasets will look like:
X Y ID X Y ID X Y ID
1 0 1 0 0 1 1 1 1
1 1 2 0 1 2 1 0 2
1 1 3 1 0 3 0 0 3
0 1 4 1 1 4 1 1 4
0 0 5 0 1 5 0 1 5 etc.. etc..
And so on with every possibly combination
P.S. it's not strictly necessary that the result are in different dataset, and it's better if there are only the datasets with ALL the combination of X and Y (in this case 1 1, 1 0, 0 1, 0 0)
I had tried to find a function but i can't find anything...
Thanks Everyone
You can use an SQL cross join here or a do loop.
In the cross join, first make tables that have the list of values for each variable and then do a select *.
data a;
input X;
cards;
0
1
;
data b;
do y=-1 to 1;
output;
end;
run;
proc sql;
create table allcomb as
select *
from a, b;
quit;
Or use nested do loops to get all combinations:
data want_option2;
do x=0 to 1;
do y= -1 to 1;
output;
end;
end;
run;
You haven't really explained where N sample comes inthough, so I guess I can help with that after you've clarified it.
@SquashingOtters wrote:
Hello everyone, I'm a freshly new Sas User, I have to study the convergence of the log-binomial model, however I need to generate various dataset of N numerosity with every possible combination of the value of X and Y.
Y is a binary variable (0, 1) and X can vary (it can be binary or -1,0,1 or -2,-1,0,1,2 etc, but always with integers number) and I need to generate every possible dataset with N observations and combinations between X and Y.
EXAMPLE:
If I want to have N=4 Obs and X and Y can be only 0 or 1, I could have only the following dataset:
0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1
0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1
I had tried for a week but I can't find a function that could help me.
P.S. it's not strictly necessary that the result are in different dataset.
Thanks a lot and sorry for the bad English, I hope that my message is clear.
Hi, I wrong explain my problems in first place, sorry...
the N value is the number of the sample size that I want, for example, if I want to simulate a case-control study, N is the number of the patient, and the combinations of X and Y are all the possible conditions of the patient (X=disease(1 yes, 0 no), Y=exposure(1 yes, 0 no)).
What i need is every dataset of N numerosity that contain all the possible case of the data (the order doesn't count), and all the possible combination of X and Y (1 1, 1 0, 0 1, 0 0)
If I understand what you want I think this may be the easiest way to go.
data example; do n=1 to 4; do y=0,1; do x= 0,1; output; end; end; end; run;
The key to this is that you can specify individual values on a do loop.
If you want your x to be -1, -0.5, 0, 0.5 and 1 then place then in a comma delimited list on the DO X= loop control.
I think that you should keep the N value around as in my example data set.
Hi, I explain wrong my problems in first place, sorry... the N value is the number of the sample size that I want, for example, if I want to simulate a case-control study, N is the number of the patient, and the combinations of X and Y are all the possible conditions of the patient (X=disease(1 yes, 0 no), Y=exposure(1 yes, 0 no)). What i need is every dataset of N numerosity that contain all the possible case of the data (the order doesn't count), and all the possible combination of X and Y (1 1, 1 0, 0 1, 0 0)
If I understood, you need the help of allcombi routine. For example, for x = (-1, 0, 1), y = (0, 1), N = 4 :
data values;
do y = 0, 1;
do x = -1, 0, 1;
i + 1;
output;
end;
end;
run;
%let N=4;
data comb;
if 0 then set values nobs=M;
array a{&N};
do k = 1 to comb(M, &N);
call allcombi(M, &N, of a{*});
do j = 1 to dim(a);
i = a{j};
output;
end;
end;
stop;
keep k i;
run;
proc sql;
create table combVal as
select
k, x, y
from comb inner join values on comb.i=values.i
order by k, x, y;
quit;
proc print data=combVal noobs; run;
So, given your example N=5 X=(0, 1) Y=(0, 1), how many datasets do you expect to get?
If i have N=5 X=(0, 1) Y=(0, 1) I expect to have 4 unique datasets with every combination of X and Y:
X Y X Y X Y X Y
0 0 0 0 0 0 0 0
0 0 1 0 1 0 1 0
1 0 1 0 0 1 0 1
0 1 0 1 0 1 1 1
1 1 1 1 1 1 1 1
since order doesn't count.
I know the number of different unique dataset by this formula:
N=given numerosity of the sample
K=number of combination of X and Y (0 0, 0 1, 1 0, 1 1 in this case)
(n-1)!/[(k-1)!(n-k)!]
Unfortunately no...
Allcombi give me an error when I use a N>k where k are the number of possible combinations between X and Y, that's cause it computes the number of combinations of K elements taken N at a time.
While the Do loop of ballardw and your code give all the possible combinations between X and Y, but what I need is every possible dataset that contains not only all the combination of X and Y, but also simulate other data (which are other combination of X and Y) for a given N sample size.
Basically every dataset/tables is unique and should contain at least once every combinations of X and Y plus other data that can assume the combination of X and Y.
For example if i have N=5 X=(0, 1) Y=(0, 1) the dataset that I should have are:
X Y X Y X Y X Y
0 0 0 0 0 0 0 0
0 0 1 0 1 0 1 0
1 0 1 0 0 1 0 1
0 1 0 1 0 1 1 1
1 1 1 1 1 1 1 1
The order doesn't count so all that matters are the frequencies of the data.
Other dataset like:
X Y
0 0
0 0
0 0
1 0
0 1 have N sample size, but it doesn't contain all the possible combination of X and Y
I hope that I had explained better...
Thank you for the interesting
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.