First of all, huge thanks to PG who helped with my prior question re random sampling post earlier.
What makes my question even more complicated is that the same patient may visit mulitple doctors, so my task is to truly RANDOMLY select a unique patient, meaning, one patient can only be RANDOMLY selected ONCE each run, even if the patient visited multiple doctors. That being said, an individual patient can only be selected ONCE RANDOMLY regardless whomever s/he saw. Here is the example, we need to include up to 3 patients for each doctor when the job is run. Assume the same patients saw both Rodger and Jackson (raw data below),
1) if entry related to patient Ted is first selected, any transactions Ted had should not be selected again for Rodger or Jackson
2) if 3 patients were selected for Rodger, they can’t be selected again for Jackson, meaning Jackson’s 3 patients have to be other patients than 3 selected for Rodger (although they also saw Jackson, they can’t be selected again as Rodger has been RANDOMLY selected if that is the case) and if Jackson did not see any other patients (let's also assume this is true in this example), the result for the first week's run would be as follow and the second week would be null if there is no new patients were seen by Rodger and Jackson:
Raw Data:
Rodger Jack 15/MAR/2015 E401C
Rodger Jack 15/MAY/2015 C985C
Rodger Ann 04/APR/2016 A215A
Rodger Ted 26/JUN/2015 C425C
Rodger Ted 23/JUL/2015 C872C
Rodger Ted 18/Jul/2015 C532C
Rodger Ted 06/Aug/2015 C231C
Rodger William 18/Apr/2016 C425C
Rodger Donald 16/May/2016 C005C
Jackson Jack 15/Feb/2015 E423A
Jackson Jack 15/FEB/2015 C985B
Jackson Ann 04/Jun/2016 A215D
Jackson Ted 26/Jan/2015 C425X
Jackson Ted 23/FEB/2015 C872F
Jackson Ted 18/OCT/2015 C530Y
Jackson Ted 06/Aug/2015 C231C
Jackson William 18/Jan/2016 C430C
Jackson Donald 29/May/2016 C104D
Result from first run:
Rodger Jack 15/MAY/2015 C985C
Rodger Ann 04/APR/2016 A215A
Rodger Ted 26/JUN/2015 C425C
Jackson William 18/Jan/2016 C430C
Jackson Donald 29/May/2016 C104D
As PROC SURVEYSELECT is not available in the HOST SAS environment, my question is how to use data step or macro or SQL to randomly select one transaction and once that transaction is selected, any transactions associated with that patient should be excluded from the population to be selected for the next selection. I have been stuck with this and could not move on.
Any further help from anyone to tackle this challenge will be highly appreciated.
Thanks in advance.
Rodger
Perfect! Thanks Ksharp. This is exactly what I am expecting to have for now. Again thanks go to everyone who shared their insight and expertise as well.
Have a great day!
Rodger
Perhaps this will get you started:
proc sort data = claims; by doc pat; run; data sample; do until(last.doc); set claims; by doc; pats + 1; /* number of pats in by group */ end; offset = ranuni(17) * pats; /* choose a random number */ wanted = sum( running, offset ); /* pointer to desired record */ set claims point = wanted;
running + pats; /* keep track of where you are */ output; run;
One of the issues that is easy to overlook: what does random selection entail?
If one patient has 10 visits, and another has 2 visits, should they have an equal chance of being selected? Or should the first patient be 5 times more likely to be selected than the second? Or doesn't it make a difference?
Thanks both sh0e and Astounding.
For now, I will not consider the weight of the patient's visit frequency, but yes, this will be considered in the future. So given the equal chance of being selected, I would like to have all the unique patients to be included but with randomly selected entry. Once this is done, I also need to randomly select up to certain number of patient for each doctor in which case any doctor who has less than the cutoff number will be all included and those who have more than the cutoff numbers will be again randomly selected. I have been challenged with the unique patients' selection RANDOMLY (duplicate patient always exists).
Thanks for sharing your knowledge and expertise. Any further illustration would be helpful and appreciated.
Rodger
It is more like Hash Table thing.
data have;
input doctor $ patient $ date : $20. x $;
cards;
Rodger Jack 15/MAR/2015 E401C
Rodger Jack 15/MAY/2015 C985C
Rodger Ann 04/APR/2016 A215A
Rodger Ted 26/JUN/2015 C425C
Rodger Ted 23/JUL/2015 C872C
Rodger Ted 18/Jul/2015 C532C
Rodger Ted 06/Aug/2015 C231C
Rodger William 18/Apr/2016 C425C
Rodger Donald 16/May/2016 C005C
Jackson Jack 15/Feb/2015 E423A
Jackson Jack 15/FEB/2015 C985B
Jackson Ann 04/Jun/2016 A215D
Jackson Ted 26/Jan/2015 C425X
Jackson Ted 23/FEB/2015 C872F
Jackson Ted 18/OCT/2015 C530Y
Jackson Ted 06/Aug/2015 C231C
Jackson William 18/Jan/2016 C430C
Jackson Donald 29/May/2016 C104D
;
run;
proc freq data=have noprint;
table patient/out=patient(keep=patient);
run;
data want;
if _n_=1 then do;
if 0 then set patient;
declare hash h(dataset:'patient');
h.definekey('patient');
h.definedone();
end;
set have;
by doctor notsorted;
if first.doctor then count=0;
if count lt 3 and h.check()=0 then do;
count+1;h.remove();output;
end;
drop count;
run;
If you want randomly select doctor and patient , try add two more group variables :
data have;
input doctor $ patient $ date : $20. x $;
cards;
Rodger Jack 15/MAR/2015 E401C
Rodger Jack 15/MAY/2015 C985C
Rodger Ann 04/APR/2016 A215A
Rodger Ted 26/JUN/2015 C425C
Rodger Ted 23/JUL/2015 C872C
Rodger Ted 18/Jul/2015 C532C
Rodger Ted 06/Aug/2015 C231C
Rodger William 18/Apr/2016 C425C
Rodger Donald 16/May/2016 C005C
Jackson Jack 15/Feb/2015 E423A
Jackson Jack 15/FEB/2015 C985B
Jackson Ann 04/Jun/2016 A215D
Jackson Ted 26/Jan/2015 C425X
Jackson Ted 23/FEB/2015 C872F
Jackson Ted 18/OCT/2015 C530Y
Jackson Ted 06/Aug/2015 C231C
Jackson William 18/Jan/2016 C430C
Jackson Donald 29/May/2016 C104D
;
run;
proc freq data=have noprint;
table patient/out=patient(keep=patient);
run;
proc sort data=have;by doctor patient;run;
data have;
set have;
by doctor patient;
retain group1 group2;
call streaminit(123456789);
if first.doctor then group1=rand('uniform');
if first.patient then group2=rand('uniform');
run;
proc sort data=have;by group1 group2;run;
data want;
if _n_=1 then do;
if 0 then set patient;
declare hash h(dataset:'patient');
h.definekey('patient');
h.definedone();
end;
set have;
by group1;
if first.group1 then count=0;
if count lt 3 and h.check()=0 then do;
count+1;h.remove();output;
end;
drop count group1 group2;
run;
Perfect! Thanks Ksharp. This is exactly what I am expecting to have for now. Again thanks go to everyone who shared their insight and expertise as well.
Have a great day!
Rodger
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.