hello all-
suppose i have a data set with 3 variables and i want to make up a 4th variable that is assigned a random binary number (0/1) with a predertermined percentage of ones and zeros. for example if i wanted a 50/50 mix of 0/1:
x1 x2 x3 x4
a a a 0
a a a 0
a a a 1
a a a 1
a a a 0
a a a 1
or if i wanted a 25/75 mix (here i just round up the 1.5 zeros):
x1 x2 x3 x4
a a a 0
a a a 1
a a a 1
a a a 1
a a a 0
a a a 1
is there any possible way to do this? one way i can think of is to create a dataset with the 0/1 and just merge two data sets:
data new; set old; set zero_ones; run;
The methods listed above will not give you exactly that binary proportion, but very close to it.
This is what I use to get almost exactly the desired proportion for 0/1. The idea is to track the actual proportion generated so far and to "wiggle" the probability to compensate.
%let prob1 = 0.2345;
%let size = 100000;
data want;
if _n_ = 1 then do;
retain _need1 _remain;
_need1 = &prob1 * &size; * number of 1 needed, replace with a number if you like;
_remain = &size; * remaining number of records;
end;
do _i = 1 to &size;
x4 = ( ranuni( 12345 ) <= ( _need1 / _remain ) );
output;
if x4 then _need1 = _need1 - 1; * if is 1, need 1 less 1;
_remain = _remain - 1; * have 1 less record left;
end;
drop _remain _need1 _i;
run;
proc means;
run;
This generates exactly the desired proportion, almost without fail. If you have an input dataset, try some variation of this:
%let prob1 = 0.2345;
data want;
set have nobs=_tot;
if _n_ = 1 then do;
retain _need1 _remain;
_need1 = &prob1 * _tot; * number of 1 records needed, replace with a number if you like;
_remain = _tot; * remaining number of records;
end;
x4 = ( ranuni( 12345 ) <= ( _need1 / _remain ) );
if x4 then _need1 = _need1 - 1; * if is 1, need 1 less 1;
_remain = _remain - 1; * have 1 less record left;
drop _remain _need1;
run;
Often this idea is used in sampling as well to get exactly the desired proportion. A bit of work can extend it to stratified sampling. Just have to keep more counters and adjust appropriately.
I think you want the tabled distribution. See documentation for details.
x = RAND('TABLE',p1,p2, ...)
You could use a modulo operation. It isn't 'random' but would assign a give a evenly distributed binary set. It is repeatable and distributable across the sort of the set...
x4=mod(_n_,2) /* outputs values of 0 or 1 based on record number */
Don't need to create a separate dataset and merge, one data step is enough.
For really simple 0/1 proportions this works well:
data want;
set have;
x4 = ( uniform( 12345 ) >= 0.5 ); * 50/50 of 0 and 1;
x4 = ( uniform( 12345 ) >= 0.75 ); * 75/25 of 0 and 1;
run;
For more complicated value distributions, use data_null_'s method:
data want;
set have;
call streaminit( 12345 ); * only the very first call matters;
x4 = rand( 'TABLE', 0.25, 0.75 ); * 25% 1, 75% 2;
run;
thanks everyone for the suggestions. one thing i noticed: since the rand(table) and ranuni functions are really random, im not always guaranteed that predetermined mixture everytime I run the code. I just realized that after posting this topic. I came up with this code to get me close to the predetermined mixture (if a perfect separation is not possible I just round the numbers). what do you guys think?
is there a more efficient way to write the do loop - perhaps combining both do loops to say from i=1 to round(&n*&p0) do blah then from i=round(&n*&p0) +1 to round(&n*(1-&p0) do blah
%macro zeroones(n=,p0=,seed=);
data zeroones;
do i=1 to round(&n*&p0);
x=0; y=ranuni(&seed); output;
end;
do i=1 to round(&n*(1-&p0));
x=1; y=ranuni(&seed); output;
end;
run;
proc sort data=zeroones out=zeroones(drop=i y); by y; run;
%mend zeroones;
%zeroones(n=20,p0=.36,seed=12489);
proc freq data=zeroones; tables x; run;
The methods listed above will not give you exactly that binary proportion, but very close to it.
This is what I use to get almost exactly the desired proportion for 0/1. The idea is to track the actual proportion generated so far and to "wiggle" the probability to compensate.
%let prob1 = 0.2345;
%let size = 100000;
data want;
if _n_ = 1 then do;
retain _need1 _remain;
_need1 = &prob1 * &size; * number of 1 needed, replace with a number if you like;
_remain = &size; * remaining number of records;
end;
do _i = 1 to &size;
x4 = ( ranuni( 12345 ) <= ( _need1 / _remain ) );
output;
if x4 then _need1 = _need1 - 1; * if is 1, need 1 less 1;
_remain = _remain - 1; * have 1 less record left;
end;
drop _remain _need1 _i;
run;
proc means;
run;
This generates exactly the desired proportion, almost without fail. If you have an input dataset, try some variation of this:
%let prob1 = 0.2345;
data want;
set have nobs=_tot;
if _n_ = 1 then do;
retain _need1 _remain;
_need1 = &prob1 * _tot; * number of 1 records needed, replace with a number if you like;
_remain = _tot; * remaining number of records;
end;
x4 = ( ranuni( 12345 ) <= ( _need1 / _remain ) );
if x4 then _need1 = _need1 - 1; * if is 1, need 1 less 1;
_remain = _remain - 1; * have 1 less record left;
drop _remain _need1;
run;
Often this idea is used in sampling as well to get exactly the desired proportion. A bit of work can extend it to stratified sampling. Just have to keep more counters and adjust appropriately.
%zeroones(want= ,have=0 ,flag= );
if &have>0 then do;
array accum&sysindex(1) _temporary_ (0);
accum&sysindex(1) + &want;
if accum&sysindex(1) >= &have then
do;
accum&sysindex(1) = accum&sysindex(1) - &have;
&flag=1;
end;
else &flag=0;
end;
%mend;
data want;
do i=1 to 100;
%zeroones(want=25,have=100,flag=x4);
output;
end;
run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.