## assigning random binary variables to data set in a predetermined mixture

Solved
Frequent Contributor
Posts: 75

# assigning random binary variables to data set in a predetermined mixture

hello all-

suppose i have a data set with 3 variables and i want to make up a 4th variable that is assigned a random binary number (0/1) with a predertermined percentage of ones and zeros. for example if i wanted a 50/50 mix of 0/1:

x1 x2 x3 x4

a a a 0

a a a 0

a a a 1

a a a 1

a a a 0

a a a 1

or if i wanted a 25/75 mix (here i just round up the 1.5 zeros):

x1 x2 x3 x4

a a a 0

a a a 1

a a a 1

a a a 1

a a a 0

a a a 1

is there any possible way to do this? one way i can think of is to create a dataset with the 0/1 and just merge two data sets:

data new; set old; set zero_ones; run;

Accepted Solutions
Solution
‎09-02-2011 09:52 AM
Frequent Contributor
Posts: 104

## Re: assigning random binary variables to data set in a predetermined mixture

The methods listed above will not give you exactly that binary proportion, but very close to it.

This is what I use to get almost exactly the desired proportion for 0/1.  The idea is to track the actual proportion generated so far and to "wiggle" the probability to compensate.

%let prob1 = 0.2345;

%let size  = 100000;

data want;

if _n_ = 1 then do;

retain _need1 _remain;

_need1 = &prob1 * &size;    *  number of 1 needed, replace with a number if you like;

_remain = &size;            *  remaining number of records;

end;

do _i = 1 to &size;

x4 = ( ranuni( 12345 ) <= ( _need1 / _remain ) );

output;

if x4 then _need1 = _need1 - 1;     * if is 1, need 1 less 1;

_remain = _remain - 1;              * have 1 less record left;

end;

drop _remain _need1 _i;

run;

proc means;

run;

This generates exactly the desired proportion, almost without fail.  If you have an input dataset, try some variation of this:

%let prob1 = 0.2345;

data want;

set have nobs=_tot;

if _n_ = 1 then do;

retain _need1 _remain;

_need1 = &prob1 * _tot;         *  number of 1 records needed, replace with a number if you like;

_remain = _tot;                 *  remaining number of records;

end;

x4 = ( ranuni( 12345 ) <= ( _need1 / _remain ) );

if x4 then _need1 = _need1 - 1;     * if is 1, need 1 less 1;

_remain = _remain - 1;               * have 1 less record left;

drop _remain _need1;

run;

Often this idea is used in sampling as well to get exactly the desired proportion.  A bit of work can extend it to stratified sampling.  Just have to keep more counters and adjust appropriately.

All Replies
Posts: 3,852

## assigning random binary variables to data set in a predetermined mixture

I think you want the tabled distribution.  See documentation for details.

x = RAND('TABLE',p1,p2, ...)

Posts: 1,318

## assigning random binary variables to data set in a predetermined mixture

You could use a modulo operation.  It isn't 'random' but would assign a give a evenly distributed binary set.  It is repeatable and distributable across the sort of the set...

x4=mod(_n_,2) /* outputs values of 0 or 1 based on record number */

Frequent Contributor
Posts: 104

## assigning random binary variables to data set in a predetermined mixture

Don't need to create a separate dataset and merge, one data step is enough.

For really simple 0/1 proportions this works well:

data want;

set have;

x4 = ( uniform( 12345 ) >= 0.5   );     * 50/50 of 0 and 1;

x4 = ( uniform( 12345 ) >= 0.75 );     * 75/25 of 0 and 1;

run;

For more complicated value distributions, use data_null_'s method:

data want;

set have;

call streaminit( 12345 );                          * only the very first call matters;

x4 = rand( 'TABLE', 0.25, 0.75 );             * 25% 1,  75% 2;

run;

Frequent Contributor
Posts: 75

## Re: assigning random binary variables to data set in a predetermined mixture

thanks everyone for the suggestions. one thing i noticed: since the rand(table) and ranuni functions are really random, im not always guaranteed that predetermined mixture everytime I run the code. I just realized that after posting this topic. I came up with this code to get me close to the predetermined mixture (if a perfect separation is not possible I just round the numbers). what do you guys think?

is there a more efficient way to write the do loop - perhaps combining both do loops to say from i=1 to round(&n*&p0) do blah then from i=round(&n*&p0) +1 to round(&n*(1-&p0) do blah

%macro zeroones(n=,p0=,seed=);

data zeroones;

do i=1 to round(&n*&p0);

x=0; y=ranuni(&seed); output;

end;

do i=1 to round(&n*(1-&p0));

x=1; y=ranuni(&seed); output;

end;

run;

proc sort data=zeroones out=zeroones(drop=i y); by y; run;

%mend zeroones;

%zeroones(n=20,p0=.36,seed=12489);

proc freq data=zeroones; tables x; run;

Solution
‎09-02-2011 09:52 AM
Frequent Contributor
Posts: 104

## Re: assigning random binary variables to data set in a predetermined mixture

The methods listed above will not give you exactly that binary proportion, but very close to it.

This is what I use to get almost exactly the desired proportion for 0/1.  The idea is to track the actual proportion generated so far and to "wiggle" the probability to compensate.

%let prob1 = 0.2345;

%let size  = 100000;

data want;

if _n_ = 1 then do;

retain _need1 _remain;

_need1 = &prob1 * &size;    *  number of 1 needed, replace with a number if you like;

_remain = &size;            *  remaining number of records;

end;

do _i = 1 to &size;

x4 = ( ranuni( 12345 ) <= ( _need1 / _remain ) );

output;

if x4 then _need1 = _need1 - 1;     * if is 1, need 1 less 1;

_remain = _remain - 1;              * have 1 less record left;

end;

drop _remain _need1 _i;

run;

proc means;

run;

This generates exactly the desired proportion, almost without fail.  If you have an input dataset, try some variation of this:

%let prob1 = 0.2345;

data want;

set have nobs=_tot;

if _n_ = 1 then do;

retain _need1 _remain;

_need1 = &prob1 * _tot;         *  number of 1 records needed, replace with a number if you like;

_remain = _tot;                 *  remaining number of records;

end;

x4 = ( ranuni( 12345 ) <= ( _need1 / _remain ) );

if x4 then _need1 = _need1 - 1;     * if is 1, need 1 less 1;

_remain = _remain - 1;               * have 1 less record left;

drop _remain _need1;

run;

Often this idea is used in sampling as well to get exactly the desired proportion.  A bit of work can extend it to stratified sampling.  Just have to keep more counters and adjust appropriately.

Posts: 1,318

## assigning random binary variables to data set in a predetermined mixture

%zeroones(want= ,have=0 ,flag= );

if &have>0 then do;

array accum&sysindex(1) _temporary_ (0);

accum&sysindex(1) + &want;

if accum&sysindex(1) >= &have then

do;

accum&sysindex(1) = accum&sysindex(1) - &have;

&flag=1;

end;

else &flag=0;

end;

%mend;

data want;

do i=1 to 100;

%zeroones(want=25,have=100,flag=x4);

output;

end;

run;

🔒 This topic is solved and locked.