Hi,
I am not sure whether this is right forum to ask this question.
I have a dataset like the following:
data have;
object=1; place='A'; prob=0.6;output;
object=1; place='B'; prob=0.3;output;
object=1; place='C'; prob=0.1;output;
object=2; place='A'; prob=0.2;output;
object=2; place='D'; prob=0.4;output;
object=2; place='E'; prob=0.3;output;
object=2; place='F'; prob=0.1;output;
run;
that is object 1 is in place A with probability 0.6, in place B with probability 0.3, in place C with probability 0.1.
I want to assign randomly each subject in only one place based on the probability. that is the output dataset should be like this.
data want;
object=1; place='A'; prob=0.6; in=1;output;
object=1; place='B'; prob=0.3; in=0;output;
object=1; place='C'; prob=0.1; in=0;output;
object=2; place='A'; prob=0.2; in=0;output;
object=2; place='D'; prob=0.4; in=0;output;
object=2; place='E'; prob=0.3; in=1;output;
object=2; place='F'; prob=0.1; in=0;output;
run;
in this example object 1 was assigned to place A and object 2 to place E. the assignation should be random.
the real dataset is quite big, each object has possibly a long list of places, and the list of places is different for each object.
Any advice is much appreciated and, if possible, I would also like to compare different kind of solutions: data steps, iml, OR(?)...
Thank you very much in advance
This seems like a job for the RAND function with tabled distribution.
data have;
object=1; place='A'; prob=0.6;output;
object=1; place='B'; prob=0.3;output;
object=1; place='C'; prob=0.1;output;
object=2; place='A'; prob=0.2;output;
object=2; place='D'; prob=0.4;output;
object=2; place='E'; prob=0.3;output;
object=2; place='F'; prob=0.1;output;
run;
proc transpose data=have out=have_t;
by object;
var prob;
run;
data want;
merge have have_t;
retain z;
by object;
if first.object then do;
seq=0;
z=rand('tabled',of col:);
end;
seq+1;
if seq=z then in=1;
else in=0;
keep object place prob in;
run;
This seems like a job for the RAND function with tabled distribution.
data have;
object=1; place='A'; prob=0.6;output;
object=1; place='B'; prob=0.3;output;
object=1; place='C'; prob=0.1;output;
object=2; place='A'; prob=0.2;output;
object=2; place='D'; prob=0.4;output;
object=2; place='E'; prob=0.3;output;
object=2; place='F'; prob=0.1;output;
run;
proc transpose data=have out=have_t;
by object;
var prob;
run;
data want;
merge have have_t;
retain z;
by object;
if first.object then do;
seq=0;
z=rand('tabled',of col:);
end;
seq+1;
if seq=z then in=1;
else in=0;
keep object place prob in;
run;
Hi @ciro,
@ciro wrote:
(...) if possible, I would also like to compare different kind of solutions: data steps, iml, OR(?)...
I don't have a SAS/IML or SAS/OR license, so here's another data step solution:
data want(drop=_:);
call streaminit(27182818);
do until(last.object);
set have;
by object;
if ~_a then do; /* i.e., if the object has not been assigned yet */
in=rand('bern',fuzz(prob/(1-coalesce(_cp,0))));
_cp=sum(_cp,prob); /* "cumulative probability" */
_a=in;
end;
else in=0;
output;
end;
run;
The FUZZ function serves as a safety measure against rounding errors (up to 1E-12) -- assuming that there are no cases with 0<prob<=1E-12.
"the assignation should be random."
It is randomly according to variable 'prob' , or just equal probability random ?
data have;
object=1; place='A'; prob=0.6;output;
object=1; place='B'; prob=0.3;output;
object=1; place='C'; prob=0.1;output;
object=2; place='A'; prob=0.2;output;
object=2; place='D'; prob=0.4;output;
object=2; place='E'; prob=0.3;output;
object=2; place='F'; prob=0.1;output;
run;
proc surveyselect data=have sampsize=1 seed=123 outrandom out=temp;
strata object;
run;
data want;
merge have temp(keep=object place in=inb);
by object place;
in=inb;
run;
OK. How about this one ?
data have;
object=1; place='A'; prob=0.6;output;
object=1; place='B'; prob=0.3;output;
object=1; place='C'; prob=0.1;output;
object=2; place='A'; prob=0.2;output;
object=2; place='D'; prob=0.4;output;
object=2; place='E'; prob=0.3;output;
object=2; place='F'; prob=0.1;output;
run;
proc surveyselect data=have sampsize=1 seed=123 method=pps out=temp;
strata object;
size prob;
run;
data want;
merge have temp(keep=object place in=inb);
by object place;
in=inb;
run;
Thank you guys. all solutions seem to work fine!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.