Solved: Randomly Select an Entry and then Remove Any Transactions Associated w...

rodgerzhang_moh · Posted 06-29-2016 02:12 PM

First of all, huge thanks to PG who helped with my prior question re random sampling post earlier.

What makes my question even more complicated is that the same patient may visit mulitple doctors, so my task is to truly RANDOMLY select a unique patient, meaning, one patient can only be RANDOMLY selected ONCE each run, even if the patient visited multiple doctors. That being said, an individual patient can only be selected ONCE RANDOMLY regardless whomever s/he saw. Here is the example, we need to include up to 3 patients for each doctor when the job is run. Assume the same patients saw both Rodger and Jackson (raw data below),

1) if entry related to patient Ted is first selected, any transactions Ted had should not be selected again for Rodger or Jackson

2) if 3 patients were selected for Rodger, they can’t be selected again for Jackson, meaning Jackson’s 3 patients have to be other patients than 3 selected for Rodger (although they also saw Jackson, they can’t be selected again as Rodger has been RANDOMLY selected if that is the case) and if Jackson did not see any other patients (let's also assume this is true in this example), the result for the first week's run would be as follow and the second week would be null if there is no new patients were seen by Rodger and Jackson:

Raw Data:

Rodger Jack 15/MAR/2015 E401C

Rodger Jack 15/MAY/2015 C985C

Rodger Ann 04/APR/2016 A215A

Rodger Ted 26/JUN/2015 C425C

Rodger Ted 23/JUL/2015 C872C

Rodger Ted 18/Jul/2015 C532C

Rodger Ted 06/Aug/2015 C231C

Rodger William 18/Apr/2016 C425C

Rodger Donald 16/May/2016 C005C

Jackson Jack 15/Feb/2015 E423A

Jackson Jack 15/FEB/2015 C985B

Jackson Ann 04/Jun/2016 A215D

Jackson Ted 26/Jan/2015 C425X

Jackson Ted 23/FEB/2015 C872F

Jackson Ted 18/OCT/2015 C530Y

Jackson Ted 06/Aug/2015 C231C

Jackson William 18/Jan/2016 C430C

Jackson Donald 29/May/2016 C104D

Result from first run:

Rodger Jack 15/MAY/2015 C985C

Rodger Ann 04/APR/2016 A215A

Rodger Ted 26/JUN/2015 C425C

Jackson William 18/Jan/2016 C430C

Jackson Donald 29/May/2016 C104D

As PROC SURVEYSELECT is not available in the HOST SAS environment, my question is how to use data step or macro or SQL to randomly select one transaction and once that transaction is selected, any transactions associated with that patient should be excluded from the population to be selected for the next selection. I have been stuck with this and could not move on.

Any further help from anyone to tackle this challenge will be highly appreciated.

Thanks in advance.

Rodger

rodgerzhang_moh · Posted 06-30-2016 07:40 AM

Perfect! Thanks Ksharp. This is exactly what I am expecting to have for now. Again thanks go to everyone who shared their insight and expertise as well.

Have a great day!

Rodger

View solution in original post

sh0e · Posted 06-29-2016 02:26 PM

Perhaps this will get you started:

proc sort data = claims;
  by doc pat;
run;

data sample;
  do until(last.doc);
     set claims;
     by doc;
     pats + 1; /* number of pats in by group */
  end;
  offset = ranuni(17) * pats; /* choose a random number */
  wanted = sum( running,  offset ); /* pointer to desired record */
  set claims point = wanted;
  running + pats; /* keep track of where you are */
  output;
run;

Astounding · Posted 06-29-2016 02:33 PM

One of the issues that is easy to overlook: what does random selection entail?

If one patient has 10 visits, and another has 2 visits, should they have an equal chance of being selected? Or should the first patient be 5 times more likely to be selected than the second? Or doesn't it make a difference?

rodgerzhang_moh · Posted 06-29-2016 02:58 PM

Thanks both sh0e and Astounding.

For now, I will not consider the weight of the patient's visit frequency, but yes, this will be considered in the future. So given the equal chance of being selected, I would like to have all the unique patients to be included but with randomly selected entry. Once this is done, I also need to randomly select up to certain number of patient for each doctor in which case any doctor who has less than the cutoff number will be all included and those who have more than the cutoff numbers will be again randomly selected. I have been challenged with the unique patients' selection RANDOMLY (duplicate patient always exists).

Thanks for sharing your knowledge and expertise. Any further illustration would be helpful and appreciated.

Rodger

Ksharp · Posted 06-30-2016 04:02 AM

It is more like Hash Table thing.

data have;
input doctor $ patient $ date : $20. x $;
cards;
Rodger Jack 15/MAR/2015 E401C
Rodger Jack 15/MAY/2015 C985C
Rodger Ann 04/APR/2016 A215A
Rodger Ted 26/JUN/2015 C425C
Rodger Ted 23/JUL/2015 C872C
Rodger Ted 18/Jul/2015 C532C
Rodger Ted 06/Aug/2015 C231C
Rodger William 18/Apr/2016 C425C
Rodger Donald 16/May/2016 C005C
Jackson Jack 15/Feb/2015 E423A
Jackson Jack 15/FEB/2015 C985B
Jackson Ann 04/Jun/2016 A215D
Jackson Ted 26/Jan/2015 C425X
Jackson Ted 23/FEB/2015 C872F
Jackson Ted 18/OCT/2015 C530Y
Jackson Ted 06/Aug/2015 C231C
Jackson William 18/Jan/2016 C430C
Jackson Donald 29/May/2016 C104D
;
run;
proc freq data=have noprint;
table patient/out=patient(keep=patient);
run;
data want;
 if _n_=1 then do;
  if 0 then set patient;
  declare hash h(dataset:'patient');
  h.definekey('patient');
  h.definedone();
 end;
set have;
by doctor notsorted;
if first.doctor then count=0;
if count lt 3 and h.check()=0 then do;
 count+1;h.remove();output;
end;
drop count;
run;

Ksharp · Posted 06-30-2016 04:21 AM

If you want randomly select doctor and patient , try add two more group variables :

data have;
input doctor $ patient $ date : $20. x $;
cards;
Rodger Jack 15/MAR/2015 E401C
Rodger Jack 15/MAY/2015 C985C
Rodger Ann 04/APR/2016 A215A
Rodger Ted 26/JUN/2015 C425C
Rodger Ted 23/JUL/2015 C872C
Rodger Ted 18/Jul/2015 C532C
Rodger Ted 06/Aug/2015 C231C
Rodger William 18/Apr/2016 C425C
Rodger Donald 16/May/2016 C005C
Jackson Jack 15/Feb/2015 E423A
Jackson Jack 15/FEB/2015 C985B
Jackson Ann 04/Jun/2016 A215D
Jackson Ted 26/Jan/2015 C425X
Jackson Ted 23/FEB/2015 C872F
Jackson Ted 18/OCT/2015 C530Y
Jackson Ted 06/Aug/2015 C231C
Jackson William 18/Jan/2016 C430C
Jackson Donald 29/May/2016 C104D
;
run;
proc freq data=have noprint;
table patient/out=patient(keep=patient);
run;
proc sort data=have;by doctor patient;run;
data have;
 set have;
 by doctor patient;
 retain group1 group2;
 call streaminit(123456789);
 if first.doctor then group1=rand('uniform');
 if first.patient then group2=rand('uniform');
run;
proc sort data=have;by group1 group2;run;


data want;
 if _n_=1 then do;
  if 0 then set patient;
  declare hash h(dataset:'patient');
  h.definekey('patient');
  h.definedone();
 end;
set have;
by group1;
if first.group1 then count=0;
if count lt 3 and h.check()=0 then do;
 count+1;h.remove();output;
end;
drop count group1 group2;
run;

rodgerzhang_moh · Posted 06-30-2016 07:40 AM

Perfect! Thanks Ksharp. This is exactly what I am expecting to have for now. Again thanks go to everyone who shared their insight and expertise as well.

Have a great day!

Rodger

Randomly Select an Entry and then Remove Any Transactions Associated with that ID Selected

Re: Randomly Select an Entry and then Remove Any Transactions Associated with that ID Selected

Re: Randomly Select an Entry and then Remove Any Transactions Associated with that ID Selected

Re: Randomly Select an Entry and then Remove Any Transactions Associated with that ID Selected

Re: Randomly Select an Entry and then Remove Any Transactions Associated with that ID Selected

Re: Randomly Select an Entry and then Remove Any Transactions Associated with that ID Selected

Re: Randomly Select an Entry and then Remove Any Transactions Associated with that ID Selected

Re: Randomly Select an Entry and then Remove Any Transactions Associated with that ID Selected

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away