Merging datasets by date range

Accepted Solution Solved
Reply
Contributor
Posts: 60
Accepted Solution

Merging datasets by date range

I have 2 datasets: 1 contains inpatient claims data and the other contains professional claims data.

 

I want to see how many professional claims there were for an inpatient visit - so I'd like to merge by date range and include a professional claim if it occurred anytime between the first and last date of service of the inpatient stay.

 

My data looks something like this for one patient:

 

Inpatient Dataset

first_datelast_dateinpatient_flag
01NOV1411NOV141
22NOV1524NOV151
01NOV1601NOV161

 

Professional Dataset

first_datelast_dateprofessional_flag
01NOV1401NOV141
03NOV1403NOV141
04NOV1404NOV141

 

 

And I'd like the final dataset to look something like:

 

first_datelast_dateprof_first_dateprof_last_dateinpatient_flagprofessional_sum
01NOV1411NOV1401NOV1401NOV1411
01NOV1411NOV1403NOV1403NOV1411
01NOV1411NOV1404NOV1404NOV1411

 


Which I'd probably roll up later to look like:

 

 

first_datelast_dateinpatient_flagprofessional_sum
01NOV1411NOV1413
22NOV1524NOV1510
01NOV1601NOV1610

 

Is there a way to do this?

I've tried merging by month & year and then using an if statement to compare the days but that doesn't seem to be working.

 

Thanks for any help!

 

 


Accepted Solutions
Solution
‎12-29-2016 02:19 PM
Super Contributor
Posts: 251

Re: Merging datasets by date range

This will work, but it won't scale. It's cartersian joinish.

 

data inpatient(index=(dates=(first_date last_date)));
infile cards;
attrib first_date last_date length=4 informat=date9. format=ddmmyy10.;
attrib inpatient_flag length=3;
input first_date
      last_date;
retain inpatient_flag 1;
cards;
01nov14 11nov14
22nov15 24nov15
01nov16 01nov16
;
run;

data professional(index=(dates=(first_date last_date)));
infile cards;
attrib first_date last_date length=4 informat=date9. format=ddmmyy10.;
attrib professional_flag length=3;
input first_date
      last_date;
retain professional_flag 1;
cards;
01nov14 01nov14
03nov14 03nov14
04nov14 04nov14
;
run;
          
proc sql magic=103;
create table joined as 
   select inpat.first_date as first_date,
          inpat.last_date as last_date,
          sum(coalesce(prof.professional_flag, 0)) as professional_sum
     from inpatient as inpat
     left join professional as prof
       on prof.first_date between inpat.first_date and inpat.last_date 
      and prof.last_date between inpat.first_date and inpat.last_date
    group by inpat.first_date,
             inpat.last_date;
quit;

I'll have a play with generating some more random data and see how it works. If the professional dataset's last_date is always the same as the first, that would help.

View solution in original post


All Replies
Super Contributor
Posts: 251

Re: Merging datasets by date range

How big are your datasets - especially the inpatient one? There are a couple of approaches I would try (hash tables, sql join), but if the tables are huge, it might be counter-productive.

Contributor
Posts: 60

Re: Merging datasets by date range

The datasets have already been trimmed, so about 50,000 observations.

##- Please type your reply above this line. Simple formatting, no
attachments. -##
Solution
‎12-29-2016 02:19 PM
Super Contributor
Posts: 251

Re: Merging datasets by date range

This will work, but it won't scale. It's cartersian joinish.

 

data inpatient(index=(dates=(first_date last_date)));
infile cards;
attrib first_date last_date length=4 informat=date9. format=ddmmyy10.;
attrib inpatient_flag length=3;
input first_date
      last_date;
retain inpatient_flag 1;
cards;
01nov14 11nov14
22nov15 24nov15
01nov16 01nov16
;
run;

data professional(index=(dates=(first_date last_date)));
infile cards;
attrib first_date last_date length=4 informat=date9. format=ddmmyy10.;
attrib professional_flag length=3;
input first_date
      last_date;
retain professional_flag 1;
cards;
01nov14 01nov14
03nov14 03nov14
04nov14 04nov14
;
run;
          
proc sql magic=103;
create table joined as 
   select inpat.first_date as first_date,
          inpat.last_date as last_date,
          sum(coalesce(prof.professional_flag, 0)) as professional_sum
     from inpatient as inpat
     left join professional as prof
       on prof.first_date between inpat.first_date and inpat.last_date 
      and prof.last_date between inpat.first_date and inpat.last_date
    group by inpat.first_date,
             inpat.last_date;
quit;

I'll have a play with generating some more random data and see how it works. If the professional dataset's last_date is always the same as the first, that would help.

Contributor
Posts: 60

Re: Merging datasets by date range

Oh wow - it worked!  Thank you so much!

Contributor
Posts: 60

Re: Merging datasets by date range

And yes, in the professional dataset, the first and last date are the same.

 

Thanks again for the help!

Super Contributor
Posts: 251

Re: Merging datasets by date range

I'm delighted.

 

This took 1'25" on my Mac - SQL really hates cartesian joins. But it's definitely doable.

data inpatient;
attrib first_date last_date length=4 informat=date9. format=ddmmyy10.;
do i = 1 to 5e4;
   first_date = '1oct2014'd + int(ranuni(225465114) * 750);
   last_date = int(ranuni(225465114) * 20 + first_date);
   output;
   end;
keep first_date last_date;
run;

proc sort data=inpatient noequals;
by first_date last_date;
run;

data professional;
attrib first_date length=4 informat=date9. format=ddmmyy10.;
retain professional_flag 1;
do i = 1 to 5e4;
   first_date = '1oct2014'd + int(ranuni(225465114) * 750);
   output;
   end;
keep first_date professional_flag;
run;

proc sort data=professional noequals;
by first_date;
quit;
          
proc sql magic=103;
create table joined as 
   select inpat.first_date as first_date,
          inpat.last_date as last_date,
          sum(coalesce(prof.professional_flag, 0)) as professional_sum
     from inpatient as inpat
     left join professional as prof
       on prof.first_date between inpat.first_date and inpat.last_date 
    group by inpat.first_date,
             inpat.last_date;
quit;

 

Super Contributor
Posts: 251

Re: Merging datasets by date range

Hmm. Although the code works and produces a result, I'm not convinced that it's the correct one. Without seeing all your data, and by definition it's confidential, it's finding all combinations where the professional's date is between the patients' dates. If you were trying to find a single match, this is - umm – better. Or betterer, at least:

 

data inpatient;
attrib first_date_pat last_date_pat length=4 informat=date9. format=ddmmyy10.;
do i = 1 to 5e4;
first_date_pat = '1oct2014'd + int(ranuni(225465114) * 750);
last_date_pat = int(ranuni(225465114) * 20 + first_date_pat);
output;
end;
keep first_date_pat last_date_pat;
run;

proc sort data=inpatient noequals; by first_date_pat last_date_pat; run; data joined; set inpatient; attrib first_date length=4 informat=date9. format=ddmmyy10.; attrib professional_flag length=3; if _n_ = 1 then do; dcl hash prof(dataset: 'professional', ordered: 'a'); dcl hiter profiter('prof'); rc = prof.definekey('first_date'); rc = prof.definedata('first_date', 'professional_flag'); rc = prof.definedone(); call missing(first_date, professional_flag); end; rc = profiter.first(); do while(not(first_date_pat le first_date le last_date_pat) and rc = 0); rc = profiter.next(); end; if rc = 0 then output; keep first_date_pat last_date_pat first_date professional_flag; run; proc sql; create table joined_summ as select first_date_pat as first_date, last_date_pat as last_date, sum(coalesce(professional_flag, 0)) as professional_sum from joined group by first_date_pat, last_date_pat; quit;

Using the iterating hash table, everything is done in memory and this flies through at under two seconds. Note that the results in professional_sum are much smaller.

 

This is me, attempting to delay taking the dog for a walk and mowing the lawns…

Contributor
Posts: 60

Re: Merging datasets by date range

Oh funny! I was just about to write back as I was seeing some funny results.

Thank you for neglecting your lawn and not walking your dog to help out a poor soul like me!

I will give this a try! Thank you again!
Super Contributor
Posts: 251

Re: Merging datasets by date range

[ Edited ]

Bug (actually several) discovered on review the code in my head, as I was walking Rosie on Silversky Track. Assuming that the question you are trying to ask is: Is there a professional available / on call / other attribute for each inpatient attendance, this will give better results:

data inpatient;
attrib first_date_pat last_date_pat length=4 informat=date9. format=ddmmyy10.;
do i = 1 to 5e4;
   first_date_pat = '1oct2014'd + int(ranuni(225465114) * 750); 
   last_date_pat = int(ranuni(225465114) * 20 + first_date_pat); 
   output; 
   end;
keep first_date_pat last_date_pat;
run;

proc sort data=inpatient noequals;
by first_date_pat last_date_pat;
run;

data professional;
attrib first_date length=4 informat=date9. format=ddmmyy10.;
do i = 1 to 5e4;
   first_date = '1oct2014'd + int(ranuni(225465114) * 750);
   output;
   end;
keep first_date;
run;

proc sort data=professional noequals nodupkey;
by first_date;
run;

data joined;
set inpatient;
attrib first_date length=4 informat=date9. format=ddmmyy10.;
attrib professional_flag length=3;
if _n_ = 1 then do;
   dcl hash prof(dataset: 'professional', ordered: 'a');
   dcl hiter profiter('prof');
   rc = prof.definekey('first_date');
   rc = prof.definedata('first_date');
   rc = prof.definedone();
   call missing(first_date);
   end;
rc = profiter.first();
do while(not(first_date_pat le first_date le last_date_pat) and rc = 0);
   rc = profiter.next();
   end;
if rc = 0
   then professional_flag = 1;   /* Found a professional record within the date range */
   else professional_flag = 0;
output;
keep first_date_pat last_date_pat first_date professional_flag;
run;

proc sql;
create table joined_summ as
   select first_date_pat as first_date,
          last_date_pat as last_date,
          sum(coalesce(professional_flag, 0)) as professional_sum
     from joined
    group by first_date_pat,
             last_date_pat;
quit;

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 235 views
  • 1 like
  • 2 in conversation