Solved: Add a variable value after comparing dates and ranges

NewUsrStat · Posted 01-07-2025 04:58 AM

Hi guys,

suppose to have the following dataset:

data DB;
  input ID :$20. Admission :date09. Discharge :date09. Range1 :date09. Range2 :date09.; 
  format Admission date9. Discharge date9. Range1 date9. Range2 date9.;
cards;
0001 13JAN2017 25JAN2017 30DEC2016 24FEB2017 
0001 22FEB2017 07MAR2017 08FEB2017 06APR2017
0001 27APR2017 16MAY2017 13APR2017 15JUN2017
0001 30JAN2019 04MAR2019 16JAN2019 03APR2019
0002 28DEC2014 03JAN2015 14DEC2014 02FEB2015
0002 03MAR2015 12MAR2015 20FEB2015 02APR2015
;

and a second dataset:


data DB1;
  input ID :$20. Visit_date :date09. Visit; 
  format Visit_date date9. ;
cards;
0001 26JAN2017 1 
0001 10FEB2017 1 
0002 22FEB2015 4
;

Is there a way to obtain the following?

data DB2;
  input ID :$20. Admission :date09. Discharge :date09. Range1 :date09. Range2 :date09. Visit; 
  format Admission date9. Discharge date9. Range1 date9. Range2 date9.;
cards;
0001 13JAN2017 25JAN2017 30DEC2016 24FEB2017  1
0001 22FEB2017 07MAR2017 08FEB2017 06APR2017  1
0001 27APR2017 16MAY2017 13APR2017 15JUN2017  0
0001 30JAN2019 04MAR2019 16JAN2019 03APR2019  0
0002 28DEC2014 03JAN2015 14DEC2014 02FEB2015  0
0002 03MAR2015 12MAR2015 20FEB2015 02APR2015  4
;

In other words, if the visit_date (DB1) occurs in the range Range1-Range2 of DB for each patient (ID) then it will be reported in DB to obtain DB2.

Logic: 1) check if the visit_date in DB1 occurs between Range1 and Range2 of DB for each ID

2) if the visit_date in DB1 occurs between Range1 and Range2 in DB for each ID, then create DB2. DB2 is as DB + the visit value ("visit" variable of DB1) where it occurs (i.e., at the corresponding range)

Thank you in advance

quickbluefish · Posted 01-07-2025 08:41 AM

OK, I am also pretty confused by this - I think what you're asking for is quite simple, but I could be mistaken. One thing I don't understand -- is the value of 'Visit' in DB1 supposed to be a *count* of visits (e.g., 4 visits occurred on this date)? I was initially thinking it was more like a visit identifier (e.g., a person's 1st visit, 2nd visit, etc.), but then I see you have Visit=1 for two separate dates for person 0001. In any case, seems like this is just a simple SQL join:

proc sql;
create table want as
select a.*, max(0, b.visit) as Visit
from 
    DB A
    left join
    DB1 B
    on a.patient=b.patient and a.range1<=b.visitdate<=a.range2
order by a.patient, a.admission;
quit;

Depending on what the variable 'Visit' means (see question above) it may make more sense to set Visit to missing in DB2 when there is no such visit in the corresponding range. In that case, you'd just remove the ' else 0' part above.

What happens if there is more than one visitdate in a given range?

View solution in original post

PaigeMiller · Posted 01-07-2025 05:00 AM

In other words, if the visit (DB1) occurs in the range Range1-Range2 of DB for each patient (ID) then it will be reported in DB to obtain DB2.

I don't understand what this means. Can you walk us through the logic, step-by-step?

--
Paige Miller

NewUsrStat · Posted 01-07-2025 05:20 AM

Just edited

PeterClemmensen · Posted 01-07-2025 06:07 AM

I edited the DB1 data a bit. I assume you want Visit_date to have the date9 format (not Visit).

Here is how I interpret your logic: For each observation in DB you want to evaluate each Visit_date in DB1 . If Visit_date is within the range of Range1 and Range2 for any observation in DB1, set Visit = 1. If not, set Visit = 0.

data DB;
  input ID :$20. Admission :date09. Discharge :date09. Range1 :date09. Range2 :date09.; 
  format Admission date9. Discharge date9. Range1 date9. Range2 date9.;
cards;
0001 13JAN2017 25JAN2017 30DEC2016 24FEB2017 
0001 22FEB2017 07MAR2017 08FEB2017 06APR2017
0001 27APR2017 16MAY2017 13APR2017 15JUN2017
0001 30JAN2019 04MAR2019 16JAN2019 03APR2019
0002 28DEC2014 03JAN2015 14DEC2014 02FEB2015
0002 03MAR2015 12MAR2015 20FEB2015 02APR2015
;

data DB1;
  input ID :$20. Visit_date :date09.; 
  format Visit_date date09.;
cards;
0001 26JAN2017 
0001 10FEB2017 
0002 22FEB2015 
;

data DB2(drop = Visit_date);

   if _N_ = 1 then do;
      dcl hash h(dataset : 'DB1', multidata : 'Y');
      h.definekey('ID');
      h.definedata('Visit_date');
      h.definedone();
   end;

   set DB;

   Visit_date = .;
   Visit = 0;

   do while (h.do_over() = 0);
      if Range1 <= Visit_date <= Range2 then Visit = 1;
   end;

run;

Result:

ID    Admission  Discharge  Range1     Range2     Visit
0001  13JAN2017  25JAN2017  30DEC2016  24FEB2017  1
0001  22FEB2017  07MAR2017  08FEB2017  06APR2017  1
0001  27APR2017  16MAY2017  13APR2017  15JUN2017  0
0001  30JAN2019  04MAR2019  16JAN2019  03APR2019  0
0002  28DEC2014  03JAN2015  14DEC2014  02FEB2015  0
0002  03MAR2015  12MAR2015  20FEB2015  02APR2015  1

The DATA to DATA Step Macro
Blog: SASnrd

NewUsrStat · Posted 01-07-2025 06:35 AM

Thank you very much for your help. It is not exactly what I need. First "for any observation in DB1" is not completely true because it should run ID per ID. DB1 reports visits ID per ID. 2) "set Visit = 1. If not, set Visit = 0" not True. Instead, set the value Visit takes in DB1. It could take 2 or a value different from 1. I will edit my post to make it more clear.

quickbluefish · Posted 01-07-2025 08:41 AM

OK, I am also pretty confused by this - I think what you're asking for is quite simple, but I could be mistaken. One thing I don't understand -- is the value of 'Visit' in DB1 supposed to be a *count* of visits (e.g., 4 visits occurred on this date)? I was initially thinking it was more like a visit identifier (e.g., a person's 1st visit, 2nd visit, etc.), but then I see you have Visit=1 for two separate dates for person 0001. In any case, seems like this is just a simple SQL join:

proc sql;
create table want as
select a.*, max(0, b.visit) as Visit
from 
    DB A
    left join
    DB1 B
    on a.patient=b.patient and a.range1<=b.visitdate<=a.range2
order by a.patient, a.admission;
quit;

Depending on what the variable 'Visit' means (see question above) it may make more sense to set Visit to missing in DB2 when there is no such visit in the corresponding range. In that case, you'd just remove the ' else 0' part above.

What happens if there is more than one visitdate in a given range?

quickbluefish · Posted 01-07-2025 08:51 AM

^^ just edited the above SQL - the join was wrong.

NewUsrStat · Posted 01-07-2025 08:53 AM

Thank you very much for your comment and help. 1) "One thing I don't understand -- is the value of 'Visit' in DB1 supposed to be a *count* of visits (e.g., 4 visits occurred on this date)? " Yes it is. Only a count nothing more. 2) "What happens if there is more than one visit_date in a given range?" No cases like this in the data.

Add a variable value after comparing dates and ranges

Re: Add a variable value after comparing dates and ranges

Re: Add a variable value after comparing dates and ranges

Re: Add a variable value after comparing dates and ranges

Re: Add a variable value after comparing dates and ranges

Re: Add a variable value after comparing dates and ranges

Re: Add a variable value after comparing dates and ranges

Re: Add a variable value after comparing dates and ranges

Re: Add a variable value after comparing dates and ranges

Registration is open