BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
trekvana
Calcite | Level 5

hello all-

suppose i have a data set with 3 variables and i want to make up a 4th variable that is assigned a random binary number (0/1) with a predertermined percentage of ones and zeros. for example if i wanted a 50/50 mix of 0/1:

x1 x2 x3 x4

a a a 0

a a a 0

a a a 1

a a a 1

a a a 0

a a a 1

or if i wanted a 25/75 mix (here i just round up the 1.5 zeros):

x1 x2 x3 x4

a a a 0

a a a 1

a a a 1

a a a 1

a a a 0

a a a 1

is there any possible way to do this? one way i can think of is to create a dataset with the 0/1 and just merge two data sets:

data new; set old; set zero_ones; run;

1 ACCEPTED SOLUTION

Accepted Solutions
DLing
Obsidian | Level 7

The methods listed above will not give you exactly that binary proportion, but very close to it.

This is what I use to get almost exactly the desired proportion for 0/1.  The idea is to track the actual proportion generated so far and to "wiggle" the probability to compensate.

%let prob1 = 0.2345;

%let size  = 100000;

data want;

    if _n_ = 1 then do;

     retain _need1 _remain;

     _need1 = &prob1 * &size;    *  number of 1 needed, replace with a number if you like;

     _remain = &size;            *  remaining number of records;

    end;

    do _i = 1 to &size;

     x4 = ( ranuni( 12345 ) <= ( _need1 / _remain ) );

     output;

     if x4 then _need1 = _need1 - 1;     * if is 1, need 1 less 1;

     _remain = _remain - 1;              * have 1 less record left;

    end;

    drop _remain _need1 _i;

run;

proc means;

run;

This generates exactly the desired proportion, almost without fail.  If you have an input dataset, try some variation of this:

%let prob1 = 0.2345;

data want;

     set have nobs=_tot;

     if _n_ = 1 then do;

          retain _need1 _remain;

          _need1 = &prob1 * _tot;         *  number of 1 records needed, replace with a number if you like;

          _remain = _tot;                 *  remaining number of records;

     end;

     x4 = ( ranuni( 12345 ) <= ( _need1 / _remain ) );

     if x4 then _need1 = _need1 - 1;     * if is 1, need 1 less 1;

     _remain = _remain - 1;               * have 1 less record left;

     drop _remain _need1;

run;

    

Often this idea is used in sampling as well to get exactly the desired proportion.  A bit of work can extend it to stratified sampling.  Just have to keep more counters and adjust appropriately.

View solution in original post

6 REPLIES 6
data_null__
Jade | Level 19

I think you want the tabled distribution.  See documentation for details.

x = RAND('TABLE',p1,p2, ...)

FriedEgg
SAS Employee

You could use a modulo operation.  It isn't 'random' but would assign a give a evenly distributed binary set.  It is repeatable and distributable across the sort of the set...

x4=mod(_n_,2) /* outputs values of 0 or 1 based on record number */

DLing
Obsidian | Level 7

Don't need to create a separate dataset and merge, one data step is enough.

For really simple 0/1 proportions this works well:

data want;

     set have;

     x4 = ( uniform( 12345 ) >= 0.5   );     * 50/50 of 0 and 1;

     x4 = ( uniform( 12345 ) >= 0.75 );     * 75/25 of 0 and 1;

run;

For more complicated value distributions, use data_null_'s method:

data want;

     set have;

     call streaminit( 12345 );                          * only the very first call matters;

     x4 = rand( 'TABLE', 0.25, 0.75 );             * 25% 1,  75% 2;

run;

trekvana
Calcite | Level 5

thanks everyone for the suggestions. one thing i noticed: since the rand(table) and ranuni functions are really random, im not always guaranteed that predetermined mixture everytime I run the code. I just realized that after posting this topic. I came up with this code to get me close to the predetermined mixture (if a perfect separation is not possible I just round the numbers). what do you guys think?

is there a more efficient way to write the do loop - perhaps combining both do loops to say from i=1 to round(&n*&p0) do blah then from i=round(&n*&p0) +1 to round(&n*(1-&p0) do blah

%macro zeroones(n=,p0=,seed=);

data zeroones;

    do i=1 to round(&n*&p0);

        x=0; y=ranuni(&seed); output;

    end;

    do i=1 to round(&n*(1-&p0));

        x=1; y=ranuni(&seed); output;

    end;

run;

proc sort data=zeroones out=zeroones(drop=i y); by y; run;

%mend zeroones;

%zeroones(n=20,p0=.36,seed=12489);

proc freq data=zeroones; tables x; run;

DLing
Obsidian | Level 7

The methods listed above will not give you exactly that binary proportion, but very close to it.

This is what I use to get almost exactly the desired proportion for 0/1.  The idea is to track the actual proportion generated so far and to "wiggle" the probability to compensate.

%let prob1 = 0.2345;

%let size  = 100000;

data want;

    if _n_ = 1 then do;

     retain _need1 _remain;

     _need1 = &prob1 * &size;    *  number of 1 needed, replace with a number if you like;

     _remain = &size;            *  remaining number of records;

    end;

    do _i = 1 to &size;

     x4 = ( ranuni( 12345 ) <= ( _need1 / _remain ) );

     output;

     if x4 then _need1 = _need1 - 1;     * if is 1, need 1 less 1;

     _remain = _remain - 1;              * have 1 less record left;

    end;

    drop _remain _need1 _i;

run;

proc means;

run;

This generates exactly the desired proportion, almost without fail.  If you have an input dataset, try some variation of this:

%let prob1 = 0.2345;

data want;

     set have nobs=_tot;

     if _n_ = 1 then do;

          retain _need1 _remain;

          _need1 = &prob1 * _tot;         *  number of 1 records needed, replace with a number if you like;

          _remain = _tot;                 *  remaining number of records;

     end;

     x4 = ( ranuni( 12345 ) <= ( _need1 / _remain ) );

     if x4 then _need1 = _need1 - 1;     * if is 1, need 1 less 1;

     _remain = _remain - 1;               * have 1 less record left;

     drop _remain _need1;

run;

    

Often this idea is used in sampling as well to get exactly the desired proportion.  A bit of work can extend it to stratified sampling.  Just have to keep more counters and adjust appropriately.

FriedEgg
SAS Employee

%zeroones(want= ,have=0 ,flag= );

if &have>0 then do;

array accum&sysindex(1) _temporary_ (0);

  accum&sysindex(1) + &want;

  if accum&sysindex(1) >= &have then

   do;

    accum&sysindex(1) = accum&sysindex(1) - &have;

          &flag=1;

   end;

  else &flag=0;

end;

%mend;

data want;

do i=1 to 100;

%zeroones(want=25,have=100,flag=x4);

output;

end;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 4823 views
  • 9 likes
  • 4 in conversation