Help using Base SAS procedures

assigning random binary variables to data set in a predetermined mixture

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 75
Accepted Solution

assigning random binary variables to data set in a predetermined mixture

hello all-

suppose i have a data set with 3 variables and i want to make up a 4th variable that is assigned a random binary number (0/1) with a predertermined percentage of ones and zeros. for example if i wanted a 50/50 mix of 0/1:

x1 x2 x3 x4

a a a 0

a a a 0

a a a 1

a a a 1

a a a 0

a a a 1

or if i wanted a 25/75 mix (here i just round up the 1.5 zeros):

x1 x2 x3 x4

a a a 0

a a a 1

a a a 1

a a a 1

a a a 0

a a a 1

is there any possible way to do this? one way i can think of is to create a dataset with the 0/1 and just merge two data sets:

data new; set old; set zero_ones; run;


Accepted Solutions
Solution
‎09-02-2011 09:52 AM
Frequent Contributor
Posts: 104

Re: assigning random binary variables to data set in a predetermined mixture

The methods listed above will not give you exactly that binary proportion, but very close to it.

This is what I use to get almost exactly the desired proportion for 0/1.  The idea is to track the actual proportion generated so far and to "wiggle" the probability to compensate.

%let prob1 = 0.2345;

%let size  = 100000;

data want;

    if _n_ = 1 then do;

     retain _need1 _remain;

     _need1 = &prob1 * &size;    *  number of 1 needed, replace with a number if you like;

     _remain = &size;            *  remaining number of records;

    end;

    do _i = 1 to &size;

     x4 = ( ranuni( 12345 ) <= ( _need1 / _remain ) );

     output;

     if x4 then _need1 = _need1 - 1;     * if is 1, need 1 less 1;

     _remain = _remain - 1;              * have 1 less record left;

    end;

    drop _remain _need1 _i;

run;

proc means;

run;

This generates exactly the desired proportion, almost without fail.  If you have an input dataset, try some variation of this:

%let prob1 = 0.2345;

data want;

     set have nobs=_tot;

     if _n_ = 1 then do;

          retain _need1 _remain;

          _need1 = &prob1 * _tot;         *  number of 1 records needed, replace with a number if you like;

          _remain = _tot;                 *  remaining number of records;

     end;

     x4 = ( ranuni( 12345 ) <= ( _need1 / _remain ) );

     if x4 then _need1 = _need1 - 1;     * if is 1, need 1 less 1;

     _remain = _remain - 1;               * have 1 less record left;

     drop _remain _need1;

run;

    

Often this idea is used in sampling as well to get exactly the desired proportion.  A bit of work can extend it to stratified sampling.  Just have to keep more counters and adjust appropriately.

View solution in original post


All Replies
Respected Advisor
Posts: 3,799

assigning random binary variables to data set in a predetermined mixture

I think you want the tabled distribution.  See documentation for details.

x = RAND('TABLE',p1,p2, ...)

Trusted Advisor
Posts: 1,301

assigning random binary variables to data set in a predetermined mixture

You could use a modulo operation.  It isn't 'random' but would assign a give a evenly distributed binary set.  It is repeatable and distributable across the sort of the set...

x4=mod(_n_,2) /* outputs values of 0 or 1 based on record number */

Frequent Contributor
Posts: 104

assigning random binary variables to data set in a predetermined mixture

Don't need to create a separate dataset and merge, one data step is enough.

For really simple 0/1 proportions this works well:

data want;

     set have;

     x4 = ( uniform( 12345 ) >= 0.5   );     * 50/50 of 0 and 1;

     x4 = ( uniform( 12345 ) >= 0.75 );     * 75/25 of 0 and 1;

run;

For more complicated value distributions, use data_null_'s method:

data want;

     set have;

     call streaminit( 12345 );                          * only the very first call matters;

     x4 = rand( 'TABLE', 0.25, 0.75 );             * 25% 1,  75% 2;

run;

Frequent Contributor
Posts: 75

Re: assigning random binary variables to data set in a predetermined mixture

thanks everyone for the suggestions. one thing i noticed: since the rand(table) and ranuni functions are really random, im not always guaranteed that predetermined mixture everytime I run the code. I just realized that after posting this topic. I came up with this code to get me close to the predetermined mixture (if a perfect separation is not possible I just round the numbers). what do you guys think?

is there a more efficient way to write the do loop - perhaps combining both do loops to say from i=1 to round(&n*&p0) do blah then from i=round(&n*&p0) +1 to round(&n*(1-&p0) do blah

%macro zeroones(n=,p0=,seed=);

data zeroones;

    do i=1 to round(&n*&p0);

        x=0; y=ranuni(&seed); output;

    end;

    do i=1 to round(&n*(1-&p0));

        x=1; y=ranuni(&seed); output;

    end;

run;

proc sort data=zeroones out=zeroones(drop=i y); by y; run;

%mend zeroones;

%zeroones(n=20,p0=.36,seed=12489);

proc freq data=zeroones; tables x; run;

Solution
‎09-02-2011 09:52 AM
Frequent Contributor
Posts: 104

Re: assigning random binary variables to data set in a predetermined mixture

The methods listed above will not give you exactly that binary proportion, but very close to it.

This is what I use to get almost exactly the desired proportion for 0/1.  The idea is to track the actual proportion generated so far and to "wiggle" the probability to compensate.

%let prob1 = 0.2345;

%let size  = 100000;

data want;

    if _n_ = 1 then do;

     retain _need1 _remain;

     _need1 = &prob1 * &size;    *  number of 1 needed, replace with a number if you like;

     _remain = &size;            *  remaining number of records;

    end;

    do _i = 1 to &size;

     x4 = ( ranuni( 12345 ) <= ( _need1 / _remain ) );

     output;

     if x4 then _need1 = _need1 - 1;     * if is 1, need 1 less 1;

     _remain = _remain - 1;              * have 1 less record left;

    end;

    drop _remain _need1 _i;

run;

proc means;

run;

This generates exactly the desired proportion, almost without fail.  If you have an input dataset, try some variation of this:

%let prob1 = 0.2345;

data want;

     set have nobs=_tot;

     if _n_ = 1 then do;

          retain _need1 _remain;

          _need1 = &prob1 * _tot;         *  number of 1 records needed, replace with a number if you like;

          _remain = _tot;                 *  remaining number of records;

     end;

     x4 = ( ranuni( 12345 ) <= ( _need1 / _remain ) );

     if x4 then _need1 = _need1 - 1;     * if is 1, need 1 less 1;

     _remain = _remain - 1;               * have 1 less record left;

     drop _remain _need1;

run;

    

Often this idea is used in sampling as well to get exactly the desired proportion.  A bit of work can extend it to stratified sampling.  Just have to keep more counters and adjust appropriately.

Trusted Advisor
Posts: 1,301

assigning random binary variables to data set in a predetermined mixture

%zeroones(want= ,have=0 ,flag= );

if &have>0 then do;

array accum&sysindex(1) _temporary_ (0);

  accum&sysindex(1) + &want;

  if accum&sysindex(1) >= &have then

   do;

    accum&sysindex(1) = accum&sysindex(1) - &have;

          &flag=1;

   end;

  else &flag=0;

end;

%mend;

data want;

do i=1 to 100;

%zeroones(want=25,have=100,flag=x4);

output;

end;

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 1695 views
  • 9 likes
  • 4 in conversation