Hi,
I am not sure whether this is right forum to ask this question.
I have a dataset like the following:
data have;
object=1; place='A'; prob=0.6;output;
object=1; place='B'; prob=0.3;output;
object=1; place='C'; prob=0.1;output;
object=2; place='A'; prob=0.2;output;
object=2; place='D'; prob=0.4;output;
object=2; place='E'; prob=0.3;output;
object=2; place='F'; prob=0.1;output;
run;
that is object 1 is in place A with probability 0.6, in place B with probability 0.3, in place C with probability 0.1.
I want to assign randomly each subject in only one place based on the probability. that is the output dataset should be like this.
data want;
object=1; place='A'; prob=0.6; in=1;output;
object=1; place='B'; prob=0.3; in=0;output;
object=1; place='C'; prob=0.1; in=0;output;
object=2; place='A'; prob=0.2; in=0;output;
object=2; place='D'; prob=0.4; in=0;output;
object=2; place='E'; prob=0.3; in=1;output;
object=2; place='F'; prob=0.1; in=0;output;
run;
in this example object 1 was assigned to place A and object 2 to place E. the assignation should be random.
the real dataset is quite big, each object has possibly a long list of places, and the list of places is different for each object.
Any advice is much appreciated and, if possible, I would also like to compare different kind of solutions: data steps, iml, OR(?)...
Thank you very much in advance
This seems like a job for the RAND function with tabled distribution.
data have;
object=1; place='A'; prob=0.6;output;
object=1; place='B'; prob=0.3;output;
object=1; place='C'; prob=0.1;output;
object=2; place='A'; prob=0.2;output;
object=2; place='D'; prob=0.4;output;
object=2; place='E'; prob=0.3;output;
object=2; place='F'; prob=0.1;output;
run;
proc transpose data=have out=have_t;
by object;
var prob;
run;
data want;
merge have have_t;
retain z;
by object;
if first.object then do;
seq=0;
z=rand('tabled',of col:);
end;
seq+1;
if seq=z then in=1;
else in=0;
keep object place prob in;
run;
This seems like a job for the RAND function with tabled distribution.
data have;
object=1; place='A'; prob=0.6;output;
object=1; place='B'; prob=0.3;output;
object=1; place='C'; prob=0.1;output;
object=2; place='A'; prob=0.2;output;
object=2; place='D'; prob=0.4;output;
object=2; place='E'; prob=0.3;output;
object=2; place='F'; prob=0.1;output;
run;
proc transpose data=have out=have_t;
by object;
var prob;
run;
data want;
merge have have_t;
retain z;
by object;
if first.object then do;
seq=0;
z=rand('tabled',of col:);
end;
seq+1;
if seq=z then in=1;
else in=0;
keep object place prob in;
run;
Hi @ciro,
@ciro wrote:
(...) if possible, I would also like to compare different kind of solutions: data steps, iml, OR(?)...
I don't have a SAS/IML or SAS/OR license, so here's another data step solution:
data want(drop=_:);
call streaminit(27182818);
do until(last.object);
set have;
by object;
if ~_a then do; /* i.e., if the object has not been assigned yet */
in=rand('bern',fuzz(prob/(1-coalesce(_cp,0))));
_cp=sum(_cp,prob); /* "cumulative probability" */
_a=in;
end;
else in=0;
output;
end;
run;
The FUZZ function serves as a safety measure against rounding errors (up to 1E-12) -- assuming that there are no cases with 0<prob<=1E-12.
"the assignation should be random."
It is randomly according to variable 'prob' , or just equal probability random ?
data have;
object=1; place='A'; prob=0.6;output;
object=1; place='B'; prob=0.3;output;
object=1; place='C'; prob=0.1;output;
object=2; place='A'; prob=0.2;output;
object=2; place='D'; prob=0.4;output;
object=2; place='E'; prob=0.3;output;
object=2; place='F'; prob=0.1;output;
run;
proc surveyselect data=have sampsize=1 seed=123 outrandom out=temp;
strata object;
run;
data want;
merge have temp(keep=object place in=inb);
by object place;
in=inb;
run;
OK. How about this one ?
data have;
object=1; place='A'; prob=0.6;output;
object=1; place='B'; prob=0.3;output;
object=1; place='C'; prob=0.1;output;
object=2; place='A'; prob=0.2;output;
object=2; place='D'; prob=0.4;output;
object=2; place='E'; prob=0.3;output;
object=2; place='F'; prob=0.1;output;
run;
proc surveyselect data=have sampsize=1 seed=123 method=pps out=temp;
strata object;
size prob;
run;
data want;
merge have temp(keep=object place in=inb);
by object place;
in=inb;
run;
Thank you guys. all solutions seem to work fine!
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.