turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- assigning random binary variables to data set in a...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-01-2011 04:13 PM

hello all-

suppose i have a data set with 3 variables and i want to make up a 4th variable that is assigned a random binary number (0/1) with a predertermined percentage of ones and zeros. for example if i wanted a 50/50 mix of 0/1:

x1 x2 x3 x4

a a a 0

a a a 0

a a a 1

a a a 1

a a a 0

a a a 1

or if i wanted a 25/75 mix (here i just round up the 1.5 zeros):

x1 x2 x3 x4

a a a 0

a a a 1

a a a 1

a a a 1

a a a 0

a a a 1

is there any possible way to do this? one way i can think of is to create a dataset with the 0/1 and just merge two data sets:

data new; set old; set zero_ones; run;

Accepted Solutions

Solution

09-02-2011
09:52 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trekvana

09-02-2011 09:52 AM

The methods listed above will not give you **exactly **that binary proportion, but very close to it.

This is what I use to get almost exactly the desired proportion for 0/1. The idea is to track the actual proportion generated so far and to "wiggle" the probability to compensate.

%let prob1 = 0.2345;

%let size = 100000;

**data** want;

if _n_ = **1** then do;

retain _need1 _remain;

_need1 = &prob1 * &size; * number of 1 needed, replace with a number if you like;

_remain = &size; * remaining number of records;

end;

do _i = **1** to &size;

x4 = ( ranuni( **12345** ) <= ( _need1 / _remain ) );

output;

if x4 then _need1 = _need1 - **1**; * if is 1, need 1 less 1;

_remain = _remain - **1**; * have 1 less record left;

end;

drop _remain _need1 _i;

**run**;

**proc** **means**;

**run**;

This generates exactly the desired proportion, almost without fail. If you have an input dataset, try some variation of this:

%let prob1 = 0.2345;

**data** want;

set have nobs=_tot;

if _n_ = **1** then do;

retain _need1 _remain;

_need1 = &prob1 * _tot; * number of 1 records needed, replace with a number if you like;

_remain = _tot; * remaining number of records;

end;

x4 = ( ranuni( **12345** ) <= ( _need1 / _remain ) );

if x4 then _need1 = _need1 - **1**; * if is 1, need 1 less 1;

_remain = _remain - **1**; * have 1 less record left;

drop _remain _need1;

**run**;

Often this idea is used in sampling as well to get exactly the desired proportion. A bit of work can extend it to stratified sampling. Just have to keep more counters and adjust appropriately.

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trekvana

09-01-2011 04:17 PM

I think you want the tabled distribution. See documentation for details.

x = RAND('TABLE',p1,p2, ...)

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trekvana

09-01-2011 04:34 PM

You could use a modulo operation. It isn't 'random' but would assign a give a evenly distributed binary set. It is repeatable and distributable across the sort of the set...

x4=mod(_n_,2) /* outputs values of 0 or 1 based on record number */

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trekvana

09-01-2011 04:40 PM

Don't need to create a separate dataset and merge, one data step is enough.

For really simple 0/1 proportions this works well:

data want;

set have;

x4 = ( uniform( 12345 ) >= 0.5 ); * 50/50 of 0 and 1;

x4 = ( uniform( 12345 ) >= 0.75 ); * 75/25 of 0 and 1;

run;

For more complicated value distributions, use data_null_'s method:

data want;

set have;

call streaminit( 12345 ); * only the very first call matters;

x4 = rand( 'TABLE', 0.25, 0.75 ); * 25% 1, 75% 2;

run;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trekvana

09-01-2011 05:00 PM

thanks everyone for the suggestions. one thing i noticed: since the rand(table) and ranuni functions are really random, im not always guaranteed that predetermined mixture everytime I run the code. I just realized that after posting this topic. I came up with this code to get me close to the predetermined mixture (if a perfect separation is not possible I just round the numbers). what do you guys think?

is there a more efficient way to write the do loop - perhaps combining both do loops to say from i=1 to round(&n*&p0) do blah then from i=round(&n*&p0) +1 to round(&n*(1-&p0) do blah

%macro zeroones(n=,p0=,seed=);

data zeroones;

do i=1 to round(&n*&p0);

x=0; y=ranuni(&seed); output;

end;

do i=1 to round(&n*(1-&p0));

x=1; y=ranuni(&seed); output;

end;

run;

proc sort data=zeroones out=zeroones(drop=i y); by y; run;

%mend zeroones;

%zeroones(n=20,p0=.36,seed=12489);

proc freq data=zeroones; tables x; run;

Solution

09-02-2011
09:52 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trekvana

09-02-2011 09:52 AM

The methods listed above will not give you **exactly **that binary proportion, but very close to it.

This is what I use to get almost exactly the desired proportion for 0/1. The idea is to track the actual proportion generated so far and to "wiggle" the probability to compensate.

%let prob1 = 0.2345;

%let size = 100000;

**data** want;

if _n_ = **1** then do;

retain _need1 _remain;

_need1 = &prob1 * &size; * number of 1 needed, replace with a number if you like;

_remain = &size; * remaining number of records;

end;

do _i = **1** to &size;

x4 = ( ranuni( **12345** ) <= ( _need1 / _remain ) );

output;

if x4 then _need1 = _need1 - **1**; * if is 1, need 1 less 1;

_remain = _remain - **1**; * have 1 less record left;

end;

drop _remain _need1 _i;

**run**;

**proc** **means**;

**run**;

This generates exactly the desired proportion, almost without fail. If you have an input dataset, try some variation of this:

%let prob1 = 0.2345;

**data** want;

set have nobs=_tot;

if _n_ = **1** then do;

retain _need1 _remain;

_need1 = &prob1 * _tot; * number of 1 records needed, replace with a number if you like;

_remain = _tot; * remaining number of records;

end;

x4 = ( ranuni( **12345** ) <= ( _need1 / _remain ) );

if x4 then _need1 = _need1 - **1**; * if is 1, need 1 less 1;

_remain = _remain - **1**; * have 1 less record left;

drop _remain _need1;

**run**;

Often this idea is used in sampling as well to get exactly the desired proportion. A bit of work can extend it to stratified sampling. Just have to keep more counters and adjust appropriately.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to trekvana

09-01-2011 05:34 PM

%zeroones(want= ,have=0 ,flag= );

if &have>0 then do;

array accum&sysindex(1) _temporary_ (0);

accum&sysindex(1) + &want;

if accum&sysindex(1) >= &have then

do;

accum&sysindex(1) = accum&sysindex(1) - &have;

&flag=1;

end;

else &flag=0;

end;

%mend;

data want;

do i=1 to 100;

%zeroones(want=25,have=100,flag=x4);

output;

end;

run;