BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NewUsrStat
Lapis Lazuli | Level 10

Hi guys, 

suppose to have the following dataset: 

 

data DB;
  input ID :$20. Admission :date09. Discharge :date09. Range1 :date09. Range2 :date09.; 
  format Admission date9. Discharge date9. Range1 date9. Range2 date9.;
cards;
0001 13JAN2017 25JAN2017 30DEC2016 24FEB2017 
0001 22FEB2017 07MAR2017 08FEB2017 06APR2017
0001 27APR2017 16MAY2017 13APR2017 15JUN2017
0001 30JAN2019 04MAR2019 16JAN2019 03APR2019
0002 28DEC2014 03JAN2015 14DEC2014 02FEB2015
0002 03MAR2015 12MAR2015 20FEB2015 02APR2015
;

and a second dataset: 

 


data DB1;
  input ID :$20. Visit_date :date09. Visit; 
  format Visit_date date9. ;
cards;
0001 26JAN2017 1 
0001 10FEB2017 1 
0002 22FEB2015 4
;

Is there a way to obtain the following? 

 

data DB2;
  input ID :$20. Admission :date09. Discharge :date09. Range1 :date09. Range2 :date09. Visit; 
  format Admission date9. Discharge date9. Range1 date9. Range2 date9.;
cards;
0001 13JAN2017 25JAN2017 30DEC2016 24FEB2017  1
0001 22FEB2017 07MAR2017 08FEB2017 06APR2017  1
0001 27APR2017 16MAY2017 13APR2017 15JUN2017  0
0001 30JAN2019 04MAR2019 16JAN2019 03APR2019  0
0002 28DEC2014 03JAN2015 14DEC2014 02FEB2015  0
0002 03MAR2015 12MAR2015 20FEB2015 02APR2015  4
;

In other words, if the visit_date (DB1) occurs in the range Range1-Range2 of DB for each patient (ID) then it will be reported in DB to obtain DB2. 

 

Logic: 1) check if the visit_date in DB1 occurs between Range1 and Range2 of DB for each ID

            2) if the visit_date in DB1 occurs between  Range1 and Range2 in DB for each ID, then create DB2. DB2 is as DB + the visit value ("visit" variable of DB1) where it occurs (i.e., at the corresponding range)

 

 

Thank you in advance

1 ACCEPTED SOLUTION

Accepted Solutions
quickbluefish
Barite | Level 11

OK, I am also pretty confused by this - I think what you're asking for is quite simple, but I could be mistaken.  One thing I don't understand -- is the value of 'Visit' in DB1 supposed to be a *count* of visits (e.g., 4 visits occurred on this date)?  I was initially thinking it was more like a visit identifier (e.g., a person's 1st visit, 2nd visit, etc.), but then I see you have Visit=1 for two separate dates for person 0001.  In any case, seems like this is just a simple SQL join:

proc sql;
create table want as
select a.*, max(0, b.visit) as Visit
from 
    DB A
    left join
    DB1 B
    on a.patient=b.patient and a.range1<=b.visitdate<=a.range2
order by a.patient, a.admission;
quit;


Depending on what the variable 'Visit' means (see question above) it may make more sense to set Visit to missing in DB2 when there is no such visit in the corresponding range.  In that case, you'd just remove the ' else 0' part above.  

 

What happens if there is more than one visitdate in a given range?  

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

In other words, if the visit (DB1) occurs in the range Range1-Range2 of DB for each patient (ID) then it will be reported in DB to obtain DB2.

 

I don't understand what this means. Can you walk us through the logic, step-by-step?

--
Paige Miller
PeterClemmensen
Tourmaline | Level 20

I edited the DB1 data a bit. I assume you want Visit_date to have the date9 format (not Visit). 

 

Here is how I interpret your logic: For each observation in DB you want to evaluate each Visit_date in DB1 . If Visit_date is within the range of Range1 and Range2 for any observation in DB1, set Visit = 1. If not, set Visit = 0.

 

data DB;
  input ID :$20. Admission :date09. Discharge :date09. Range1 :date09. Range2 :date09.; 
  format Admission date9. Discharge date9. Range1 date9. Range2 date9.;
cards;
0001 13JAN2017 25JAN2017 30DEC2016 24FEB2017 
0001 22FEB2017 07MAR2017 08FEB2017 06APR2017
0001 27APR2017 16MAY2017 13APR2017 15JUN2017
0001 30JAN2019 04MAR2019 16JAN2019 03APR2019
0002 28DEC2014 03JAN2015 14DEC2014 02FEB2015
0002 03MAR2015 12MAR2015 20FEB2015 02APR2015
;

data DB1;
  input ID :$20. Visit_date :date09.; 
  format Visit_date date09.;
cards;
0001 26JAN2017 
0001 10FEB2017 
0002 22FEB2015 
;

data DB2(drop = Visit_date);

   if _N_ = 1 then do;
      dcl hash h(dataset : 'DB1', multidata : 'Y');
      h.definekey('ID');
      h.definedata('Visit_date');
      h.definedone();
   end;

   set DB;

   Visit_date = .;
   Visit = 0;

   do while (h.do_over() = 0);
      if Range1 <= Visit_date <= Range2 then Visit = 1;
   end;

run;

 

Result:

 

ID    Admission  Discharge  Range1     Range2     Visit
0001  13JAN2017  25JAN2017  30DEC2016  24FEB2017  1
0001  22FEB2017  07MAR2017  08FEB2017  06APR2017  1
0001  27APR2017  16MAY2017  13APR2017  15JUN2017  0
0001  30JAN2019  04MAR2019  16JAN2019  03APR2019  0
0002  28DEC2014  03JAN2015  14DEC2014  02FEB2015  0
0002  03MAR2015  12MAR2015  20FEB2015  02APR2015  1

 

NewUsrStat
Lapis Lazuli | Level 10
Thank you very much for your help. It is not exactly what I need. First "for any observation in DB1" is not completely true because it should run ID per ID. DB1 reports visits ID per ID. 2) "set Visit = 1. If not, set Visit = 0" not True. Instead, set the value Visit takes in DB1. It could take 2 or a value different from 1. I will edit my post to make it more clear.
quickbluefish
Barite | Level 11

OK, I am also pretty confused by this - I think what you're asking for is quite simple, but I could be mistaken.  One thing I don't understand -- is the value of 'Visit' in DB1 supposed to be a *count* of visits (e.g., 4 visits occurred on this date)?  I was initially thinking it was more like a visit identifier (e.g., a person's 1st visit, 2nd visit, etc.), but then I see you have Visit=1 for two separate dates for person 0001.  In any case, seems like this is just a simple SQL join:

proc sql;
create table want as
select a.*, max(0, b.visit) as Visit
from 
    DB A
    left join
    DB1 B
    on a.patient=b.patient and a.range1<=b.visitdate<=a.range2
order by a.patient, a.admission;
quit;


Depending on what the variable 'Visit' means (see question above) it may make more sense to set Visit to missing in DB2 when there is no such visit in the corresponding range.  In that case, you'd just remove the ' else 0' part above.  

 

What happens if there is more than one visitdate in a given range?  

quickbluefish
Barite | Level 11
^^ just edited the above SQL - the join was wrong.
NewUsrStat
Lapis Lazuli | Level 10
Thank you very much for your comment and help. 1) "One thing I don't understand -- is the value of 'Visit' in DB1 supposed to be a *count* of visits (e.g., 4 visits occurred on this date)? " Yes it is. Only a count nothing more. 2) "What happens if there is more than one visit_date in a given range?" No cases like this in the data.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 2349 views
  • 1 like
  • 4 in conversation