Hi guys,
suppose to have the following dataset:
data DB;
input ID :$20. Admission :date09. Discharge :date09. Range1 :date09. Range2 :date09.;
format Admission date9. Discharge date9. Range1 date9. Range2 date9.;
cards;
0001 13JAN2017 25JAN2017 30DEC2016 24FEB2017
0001 22FEB2017 07MAR2017 08FEB2017 06APR2017
0001 27APR2017 16MAY2017 13APR2017 15JUN2017
0001 30JAN2019 04MAR2019 16JAN2019 03APR2019
0002 28DEC2014 03JAN2015 14DEC2014 02FEB2015
0002 03MAR2015 12MAR2015 20FEB2015 02APR2015
;
and a second dataset:
data DB1;
input ID :$20. Visit_date :date09. Visit;
format Visit_date date9. ;
cards;
0001 26JAN2017 1
0001 10FEB2017 1
0002 22FEB2015 4
;
Is there a way to obtain the following?
data DB2;
input ID :$20. Admission :date09. Discharge :date09. Range1 :date09. Range2 :date09. Visit;
format Admission date9. Discharge date9. Range1 date9. Range2 date9.;
cards;
0001 13JAN2017 25JAN2017 30DEC2016 24FEB2017 1
0001 22FEB2017 07MAR2017 08FEB2017 06APR2017 1
0001 27APR2017 16MAY2017 13APR2017 15JUN2017 0
0001 30JAN2019 04MAR2019 16JAN2019 03APR2019 0
0002 28DEC2014 03JAN2015 14DEC2014 02FEB2015 0
0002 03MAR2015 12MAR2015 20FEB2015 02APR2015 4
;
In other words, if the visit_date (DB1) occurs in the range Range1-Range2 of DB for each patient (ID) then it will be reported in DB to obtain DB2.
Logic: 1) check if the visit_date in DB1 occurs between Range1 and Range2 of DB for each ID
2) if the visit_date in DB1 occurs between Range1 and Range2 in DB for each ID, then create DB2. DB2 is as DB + the visit value ("visit" variable of DB1) where it occurs (i.e., at the corresponding range)
Thank you in advance
OK, I am also pretty confused by this - I think what you're asking for is quite simple, but I could be mistaken. One thing I don't understand -- is the value of 'Visit' in DB1 supposed to be a *count* of visits (e.g., 4 visits occurred on this date)? I was initially thinking it was more like a visit identifier (e.g., a person's 1st visit, 2nd visit, etc.), but then I see you have Visit=1 for two separate dates for person 0001. In any case, seems like this is just a simple SQL join:
proc sql;
create table want as
select a.*, max(0, b.visit) as Visit
from
DB A
left join
DB1 B
on a.patient=b.patient and a.range1<=b.visitdate<=a.range2
order by a.patient, a.admission;
quit;
Depending on what the variable 'Visit' means (see question above) it may make more sense to set Visit to missing in DB2 when there is no such visit in the corresponding range. In that case, you'd just remove the ' else 0' part above.
What happens if there is more than one visitdate in a given range?
In other words, if the visit (DB1) occurs in the range Range1-Range2 of DB for each patient (ID) then it will be reported in DB to obtain DB2.
I don't understand what this means. Can you walk us through the logic, step-by-step?
I edited the DB1 data a bit. I assume you want Visit_date to have the date9 format (not Visit).
Here is how I interpret your logic: For each observation in DB you want to evaluate each Visit_date in DB1 . If Visit_date is within the range of Range1 and Range2 for any observation in DB1, set Visit = 1. If not, set Visit = 0.
data DB;
input ID :$20. Admission :date09. Discharge :date09. Range1 :date09. Range2 :date09.;
format Admission date9. Discharge date9. Range1 date9. Range2 date9.;
cards;
0001 13JAN2017 25JAN2017 30DEC2016 24FEB2017
0001 22FEB2017 07MAR2017 08FEB2017 06APR2017
0001 27APR2017 16MAY2017 13APR2017 15JUN2017
0001 30JAN2019 04MAR2019 16JAN2019 03APR2019
0002 28DEC2014 03JAN2015 14DEC2014 02FEB2015
0002 03MAR2015 12MAR2015 20FEB2015 02APR2015
;
data DB1;
input ID :$20. Visit_date :date09.;
format Visit_date date09.;
cards;
0001 26JAN2017
0001 10FEB2017
0002 22FEB2015
;
data DB2(drop = Visit_date);
if _N_ = 1 then do;
dcl hash h(dataset : 'DB1', multidata : 'Y');
h.definekey('ID');
h.definedata('Visit_date');
h.definedone();
end;
set DB;
Visit_date = .;
Visit = 0;
do while (h.do_over() = 0);
if Range1 <= Visit_date <= Range2 then Visit = 1;
end;
run;
Result:
ID Admission Discharge Range1 Range2 Visit 0001 13JAN2017 25JAN2017 30DEC2016 24FEB2017 1 0001 22FEB2017 07MAR2017 08FEB2017 06APR2017 1 0001 27APR2017 16MAY2017 13APR2017 15JUN2017 0 0001 30JAN2019 04MAR2019 16JAN2019 03APR2019 0 0002 28DEC2014 03JAN2015 14DEC2014 02FEB2015 0 0002 03MAR2015 12MAR2015 20FEB2015 02APR2015 1
OK, I am also pretty confused by this - I think what you're asking for is quite simple, but I could be mistaken. One thing I don't understand -- is the value of 'Visit' in DB1 supposed to be a *count* of visits (e.g., 4 visits occurred on this date)? I was initially thinking it was more like a visit identifier (e.g., a person's 1st visit, 2nd visit, etc.), but then I see you have Visit=1 for two separate dates for person 0001. In any case, seems like this is just a simple SQL join:
proc sql;
create table want as
select a.*, max(0, b.visit) as Visit
from
DB A
left join
DB1 B
on a.patient=b.patient and a.range1<=b.visitdate<=a.range2
order by a.patient, a.admission;
quit;
Depending on what the variable 'Visit' means (see question above) it may make more sense to set Visit to missing in DB2 when there is no such visit in the corresponding range. In that case, you'd just remove the ' else 0' part above.
What happens if there is more than one visitdate in a given range?
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.