Hi there,
Have large data set containing physicians and patients and their visits on different days. Would like to select RANDOMLY a unique date for one patient within 6 months period but up to 20 patients can be selected each run on a weekly basis. It's much easier to select the first or last record of each patient, but i had challenge to TRULY RANDOMLY select a unique record for a patient. Detail can be seen as attached file. Any help from any one will be highly appreciated.
Thanks.
Rodger
This should meet your requirements, as far as I understand them. Your example data isn't extensive enough to test every possibility. You should do some more testing.
data test;
input Doctor $ Patient $ Date :date11. Service_Type $;
format date yymmdd10.;
datalines;
Rodger Jack 15/MAR/2015 E401C
Rodger Jack 15/MAY/2015 C985C
Rodger Ann 04/APR/2016 A215A
Rodger Ted 26/JUN/2015 C425C
Rodger Ted 23/JUL/2015 C872C
Rodger Ted 18/Jul/2015 C532C
Rodger Ted 06/Aug/2015 C231C
Rodger William 18/Apr/2016 C425C
Rodger Donald 16/May/2016 C005C
Isabel James 12/Jan/2016 E401C
Isabel James 12/Jan/2016 E400C
Hanny Katz 12/Mar/2016 S512A
Thomas Youssef 15/MAR/2015 E401C
Thomas Youssef 15/MAY/2015 C985C
Thomas Anna 04/APR/2016 A215A
Thomas Jake 26/JUN/2015 C425C
Thomas Jake 23/JUL/2015 C872C
Thomas Jake 18/Jul/2015 C532C
Thomas Jake 06/Aug/2015 C231C
Thomas Tom 18/Apr/2016 C425C
Thomas Cruz 16/May/2016 C005C
Rita Bill 12/Jan/2016 E401C
Iris Hancock 12/Jan/2016 E400C
Star Katz 12/Mar/2016 S512A
Thomas Tommy 18/Apr/2016 C425C
Thomas Rita 16/May/2016 C005C
Rita Billy 12/Jan/2016 E401C
Iris Han 12/Jan/2016 E400C
Star Han 12/Mar/2016 S512A
;
/* Add week number and random number ta data */
data testw;
call streaminit(356343);
set test;
week = intnx("WEEK", date, 0);
format week weeku5.;
rnd = rand("UNIFORM");
run;
/* Keep only three records chosen randomly for each doctor, every week */
proc sort data=testw; by week doctor rnd; run;
data testw3;
do i = 1 by 1 until(last.doctor);
set testw; by week doctor;
if i <= 3 then output;
end;
drop i;
run;
/* Choose 20 random records for every week, only from patients who haven't
been chosen for at least 6 months */
proc sort data=testw3; by week rnd; run;
data want;
if 0 then set testw3;
/* Keep last chosen date for every patient in a hash */
if _n_ = 1 then do;
declare hash past(hashexp:10);
rc = past.defineKey("Patient");
rc = past.defineData("lastDate");
rc = past.defineDone();
call missing(lastDate);
end;
i = 0;
do until(i = 20 or last.week);
set testw3; by week;
/* Check that patient hasn't been chosen for 6 months */
if i <= 20 then do;
rc = past.find(key: Patient);
if rc ne 0 then do;
past.add(key:Patient, data:date);
output;
i + 1;
end;
else
if intck("MONTH", lastDate, date, "CONTINUOUS") >= 6 then do;
rc = past.add(key:Patient, data:date);
output;
i + 1;
end;
end;
end;
drop i lastDate rc rnd;
run;
proc sort data=want; by week Doctor Patient; run;
proc print data=want noobs; by week; id week; run;
This should meet your requirements, as far as I understand them. Your example data isn't extensive enough to test every possibility. You should do some more testing.
data test;
input Doctor $ Patient $ Date :date11. Service_Type $;
format date yymmdd10.;
datalines;
Rodger Jack 15/MAR/2015 E401C
Rodger Jack 15/MAY/2015 C985C
Rodger Ann 04/APR/2016 A215A
Rodger Ted 26/JUN/2015 C425C
Rodger Ted 23/JUL/2015 C872C
Rodger Ted 18/Jul/2015 C532C
Rodger Ted 06/Aug/2015 C231C
Rodger William 18/Apr/2016 C425C
Rodger Donald 16/May/2016 C005C
Isabel James 12/Jan/2016 E401C
Isabel James 12/Jan/2016 E400C
Hanny Katz 12/Mar/2016 S512A
Thomas Youssef 15/MAR/2015 E401C
Thomas Youssef 15/MAY/2015 C985C
Thomas Anna 04/APR/2016 A215A
Thomas Jake 26/JUN/2015 C425C
Thomas Jake 23/JUL/2015 C872C
Thomas Jake 18/Jul/2015 C532C
Thomas Jake 06/Aug/2015 C231C
Thomas Tom 18/Apr/2016 C425C
Thomas Cruz 16/May/2016 C005C
Rita Bill 12/Jan/2016 E401C
Iris Hancock 12/Jan/2016 E400C
Star Katz 12/Mar/2016 S512A
Thomas Tommy 18/Apr/2016 C425C
Thomas Rita 16/May/2016 C005C
Rita Billy 12/Jan/2016 E401C
Iris Han 12/Jan/2016 E400C
Star Han 12/Mar/2016 S512A
;
/* Add week number and random number ta data */
data testw;
call streaminit(356343);
set test;
week = intnx("WEEK", date, 0);
format week weeku5.;
rnd = rand("UNIFORM");
run;
/* Keep only three records chosen randomly for each doctor, every week */
proc sort data=testw; by week doctor rnd; run;
data testw3;
do i = 1 by 1 until(last.doctor);
set testw; by week doctor;
if i <= 3 then output;
end;
drop i;
run;
/* Choose 20 random records for every week, only from patients who haven't
been chosen for at least 6 months */
proc sort data=testw3; by week rnd; run;
data want;
if 0 then set testw3;
/* Keep last chosen date for every patient in a hash */
if _n_ = 1 then do;
declare hash past(hashexp:10);
rc = past.defineKey("Patient");
rc = past.defineData("lastDate");
rc = past.defineDone();
call missing(lastDate);
end;
i = 0;
do until(i = 20 or last.week);
set testw3; by week;
/* Check that patient hasn't been chosen for 6 months */
if i <= 20 then do;
rc = past.find(key: Patient);
if rc ne 0 then do;
past.add(key:Patient, data:date);
output;
i + 1;
end;
else
if intck("MONTH", lastDate, date, "CONTINUOUS") >= 6 then do;
rc = past.add(key:Patient, data:date);
output;
i + 1;
end;
end;
end;
drop i lastDate rc rnd;
run;
proc sort data=want; by week Doctor Patient; run;
proc print data=want noobs; by week; id week; run;
Thanks PGStats for your timely help. I will definitely do more test.
Again apprecaite your time and help.
Have a great day!
Rodger
Hi PGStats,
After reviewing, I just wanted to clarify that I actually don't need to select randomly the 3 patients per week derived from the service date for each doctor. In stead, the task will be a weekly run, each week, when the job is run, 3 unique patients from all the patients who might have mutiple transactions for each doctor will be selected RANDOMLY based on all the populations regardless of the service date from database where new transactions will keep coming and being appended to the database since the last week's run.However, any patients selected in the prior weeks' run should be excldued unless it's after 6 months since it was first selected.Take doctor Rodger for example:
Rodger Jack 15/MAR/2015 E401C
Rodger Jack 15/MAY/2015 C985C
Rodger Ann 04/APR/2016 A215A
Rodger Ted 26/JUN/2015 C425C
Rodger Ted 23/JUL/2015 C872C
Rodger Ted 18/Jul/2015 C532C
Rodger Ted 06/Aug/2015 C231C
Rodger William 18/Apr/2016 C425C
Rodger Donald 16/May/2016 C005C
Patient Ted had 4 transactions/visits and Jack 2 times, when the job is run in the first week and if Jack and Ted are RANDOMLY selected, only one entry each will be included, so the first week's result would be one entry from Ted, one from Jack plus another one which should be either Ann or William or Donald.
And then one week later, when the job is run, and if there is no new transactions the doctor Rodger has (meaning Rodger's patient pool remains the same), the result can only be the remaining two patients who were not selected from the first week.
Hope I made myself clearer this time. Sorry for the confusion.
Really appreciated your help and have a great one!
Rodger
So you will need another table to keep previous selections (patient name and selection date) ?
That's what I think, I would have to keep a table containing the selected result to be checked and excluded from the future selectoin. But my challenge still exists 1) uniquely RANDOMLY select 3 patients each run for a doctot and 2) exactly certain number of sample size. Two conditions have to be met. Particularly the first one, I am not able to select uniquely 3 patients but 3 records which could be the same patient. Also I tried different scenario, I am not able to have exactly the 10 or 20 sample size after 3 patients per doctor.
Sorry to bother, but any insight on this would be highly appreciqted.
Thanks.
Rodger
Apart from the 6 months constraint, you can achieve the sampling quite simply with cluster sampling:
data test;
input Doctor $ Patient $ Date :date11. Service_Type $;
format date yymmdd10.;
datalines;
Rodger Jack 15/MAR/2015 E401C
Rodger Jack 15/MAY/2015 C985C
Rodger Ann 04/APR/2016 A215A
Rodger Ted 26/JUN/2015 C425C
Rodger Ted 23/JUL/2015 C872C
Rodger Ted 18/Jul/2015 C532C
Rodger Ted 06/Aug/2015 C231C
Rodger William 18/Apr/2016 C425C
Rodger Donald 16/May/2016 C005C
PG Jack 15/MAR/2015 E401C
PG Jack 15/MAY/2015 C985C
PG Ann 04/APR/2016 A215A
PG Ted 26/JUN/2015 C425C
PG Ted 23/JUL/2015 C872C
PG Ted 18/Jul/2015 C532C
PG Ted 06/Aug/2015 C231C
PG William 18/Apr/2016 C425C
PG Donald 16/May/2016 C005C
;
proc sort data=test; by doctor patient date; run;
/* Select 3 patients per doctor */
proc surveyselect data=test sampsize=3
out=weeklyPatients(drop=DetectionProb SamplingWeight);
strata doctor;
samplingunit patient / presorted;
run;
/* Select 1 encounter per patient */
proc surveyselect data=weeklyPatients
out=weeklyEncounters sampsize=1;
strata doctor patient;
run;
proc print; run;
You would have to exclude less than 6-months-selected patients prior to this sampling to implement the 6-months constraint.
... and if you need a certain total sample size (say 3), add a final selection step
proc surveyselect
data=weeklyEncounters(drop=SamplingWeight)
out=weeklySample sampsize=3;
run;
proc print; run;
Thanks PG. I forgot to mention that the constraint I have is that this has to be done on mainframe SAS where PROC SURVEYSELECT is not available due to some reason, so I'll have to use macro and data step to accomplish this task. Is there any relatively easy way to do this other than SURVEYSELECT?
Thanks.
Rodger
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.