I have a clinic data with variables patient ID, Date of appointment and Physician ID. I want to find out the cases where a specific patient came back for a follow up appointment within a week of his first appointment. If there was no further appointment for that particular patient, its a success. If the second appointment was after a week, its again a success. But, if the first appointment was followed up with another appointment within a week, it's a failure. I tried doing it with a do loop, but I'm sure there is a better way of doing it with PROC SQL. I'm attaching the data set which is already sorted with patient id.
Million thanks in advance!
I don't know why you insist pursuing proc sql solution, since you haven't shown your datastep code, I can't really comment. In my following test, data step is way faster (only use 1/50th of time) than my Proc sql (of course, there will be many ways of doing it using Proc sql); data step will mark the second record, while proc sql will mark the first one.
1 /*PROC SQL, SLOWWOWOW*/
2 proc sql;
3 create table want_SQL as
4 select *, CASE WHEN EXISTS (select APPOINTMENT_DATE from test1 where PATIENT_ID =
4 ! A.PATIENT_ID
5
6 AND APPOINTMENT_DATE between A.APPOINTMENT_DATE+1 and
6 ! a.APPOINTMENT_DATE+7)
7
8 THEN 'FAILURE' ELSE 'SUCCESS' END AS FLAG
9 FROM TEST1 A
10 ;
NOTE: Table WORK.WANT_SQL created, with 3260 rows and 4 columns.
11 QUIT;
NOTE: PROCEDURE SQL used (Total process time):
real time 1.01 seconds
cpu time 0.99 seconds
12
13 /*DATA STEP: FAST*/
14 PROC SORT DATA=TEST1 OUT=TEST2;
15 BY patient_id APPOINTMENT_DATE;
16 RUN;
NOTE: There were 3260 observations read from the data set WORK.TEST1.
NOTE: The data set WORK.TEST2 has 3260 observations and 3 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
17
18 DATA WANT_DS;
19 DO UNTIL (LAST.PATIENT_ID);
20 SET TEST2;
21 LENGTH FLAG $ 10;
22 BY patient_id APPOINTMENT_DATE;
23 FLAG=IFC(FIRST.PATIENT_ID OR DIF(APPOINTMENT_DATE)>6,'SUCCESS','FAILURE');
24 OUTPUT;
25 END;
26 RUN;
NOTE: There were 3260 observations read from the data set WORK.TEST2.
NOTE: The data set WORK.WANT_DS has 3260 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
Haikuo
Thanks for your reply. I tried both your methods and while the Proc Sql one gave me errors and didnt run, the data step method worked and gave me a table which is not fully correct. I mean I need failure cases where the follow up appointment for a specific patient was within a week, but your result shows failure as only those cases where the follow up appointment or the next appointment was on the same day. Any other appointment was considered a success. For instance, 733 and 734 observation for the same patient is just 5 days apart, but it still calls it success. I'm not sure if there is some slight issue with your code or you just didnt understand my question correctly. The code I used earlier is below and it gave back all such cases where there was follow up appointment. Success is 1 and failure is 0.
proc sort data=vtest.test;
by PATIENT_ID APPOINTMENT_DATE;
run;
data vtest.q2;
retain adate i;
length followsuc 6;
set vtest.test;
by PATIENT_ID;
if first.patient_id then do;
i=1;
adate=APPOINTMENT_DATE;
end;
else do;
i=i+1;
if i=2 then do;
if APPOINTMENT_DATE - adate <=604800 then followsuc=0;
else followsuc=1;
output;
end;
end;
run;
devsas,
The below code should have fixed the last issue with the day 5, as it is specified for dates between 0 and 7.
data test1a;
set desktop.test1;
apt_date = datepart(appointment_date);
format apt_date mmddyy10.;
keep patient_id apt_date;
run;
proc sort data = test1a;
by patient_id apt_date;
run;
data test1a;
set test1a;
count + 1;
by patient_id;
if first.patient_id then count = 1;
run;
data test1b;
set test1a;
where count <=2; /*This is done because there are a couple of people with more than 2 visits and the point of the exercise is only after their first visit, so eliminate all of the others*/
run;
data test1c;
set test1b;
by patient_id apt_date;
retain adate;
if first.patient_id then do;
adate = apt_date;
end;
if 0<datdif(adate,apt_date,'act/act')<=7 then do;
flag = 0;
output;
end;
else do;
flag = 1;
output;
end;
drop count adate;
run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.