Hi All,
I have a data set containing distinct patients in each row and how many times they have seen a doctor in the last 6 months.
Date Patient visit
28Jul2018 AA 2
27Nov2018 BB 1
19Aug2018 BB 4
28Jul2018 AA 2
27Nov2018 GG 4
19Aug2018 CC 2
I HAVE TO randomly select N number of patients until visits counts reach 10.
Please help.
Thank you!
Does this work for you?
data have;
input Date:date9. Patient $ visit;
format Date date9.;
datalines;
28Jul2018 AA 2
27Nov2018 BB 1
19Aug2018 BB 4
28Jul2018 AA 2
27Nov2018 GG 4
19Aug2018 CC 2
28Jul2018 AA 2
27Nov2018 BB 1
19Aug2018 BB 4
28Jul2018 AA 2
27Nov2018 GG 4
19Aug2018 CC 2
;
data temp;
set have;
c+1;
run;
data want(keep=Date Patient Visit);
if 0 then set temp nobs=nobs;
declare hash h(dataset:'temp');
h.definekey('c');
h.definedata(all:'Y');
h.definedone();
sum=0;
do until (sum = 10);
pick=ceil(rand('Uniform')*nobs);
if (h.find(key:pick)=0) & ((10-sum) ge visit) then do;
sum+visit;
output;
rc=h.remove(key:pick);
end;
end;
run;
reaches exactly 10 or 10 or above?
exact 10.
And you want to do so with PROC SURVEYSELECT or can it be in a data step?
yes, it can be a data step. It doesn't have to be PROC SURVEY SELECT. Thanks!
Does this work for you?
data have;
input Date:date9. Patient $ visit;
format Date date9.;
datalines;
28Jul2018 AA 2
27Nov2018 BB 1
19Aug2018 BB 4
28Jul2018 AA 2
27Nov2018 GG 4
19Aug2018 CC 2
28Jul2018 AA 2
27Nov2018 BB 1
19Aug2018 BB 4
28Jul2018 AA 2
27Nov2018 GG 4
19Aug2018 CC 2
;
data temp;
set have;
c+1;
run;
data want(keep=Date Patient Visit);
if 0 then set temp nobs=nobs;
declare hash h(dataset:'temp');
h.definekey('c');
h.definedata(all:'Y');
h.definedone();
sum=0;
do until (sum = 10);
pick=ceil(rand('Uniform')*nobs);
if (h.find(key:pick)=0) & ((10-sum) ge visit) then do;
sum+visit;
output;
rc=h.remove(key:pick);
end;
end;
run;
Anytime, glad to help 🙂
one more question - I want to put a seed number so that every time I get the same response. may be in future I want to replicate my results. How can I do that.
No problem. Simply put
call streaminit(123);
in the data step
@PeterClemmensen Thanks for helping. one more question. lets say the data set will have '2' or '3' as values for their visit variable. there was a point when it reach to sum 9 and since there is no '1' to pick to make it 10 - it has to be either 2 or 3 - it goes into endless looping. can you suggest how to fix that.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.