BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
lichee
Quartz | Level 8

Hi all,

I have two data sets, one is claim level data, and the other is (long-term or inpatient) stay level data. I wanted to construct a new field for the stay level data based on the claim level data. For example, when a diagnosis (flag_dx) showed up in a claim during the claim service dates (clm_beg_dt and clm_end_dt), and if the claim service dates are within the stay time window, the stay should take a value 1. Otherwise, the stay has the value of 0.

 The claim level data is as below:

data clmfile;
infile datalines truncover dsd;
input Person_ID clm_beg_dt :mmddyy. clm_end_dt :mmddyy. flag_dx;
format clm_beg_dt clm_end_dt mmddyy.;
datalines;
1,4/17/2017,11/30/2017,0
1,4/17/2017,12/31/2017,0
1,4/17/2017,4/30/2018,0
1,4/17/2017,6/30/2018,0
1,4/17/2017,8/31/2018,0
1,4/17/2017,10/31/2018,0
1,4/17/2017,12/31/2018,0
2,7/17/2017,7/20/2017,0
2,7/20/2017,7/31/2017,0
2,8/1/2017,8/5/2017,0
2,6/7/2018,6/10/2018,0
2,8/2/2018,8/5/2018,1
2,8/5/2018,8/14/2018,1
2,8/14/2018,8/19/2018,1
2,8/19/2018,8/31/2018,1
2,8/19/2018,9/30/2018,1
2,8/19/2018,10/31/2018,1
2,8/19/2018,11/30/2022,1
2,8/19/2018,12/31/2018,1
2,9/1/2018,9/5/2018,1
;
run;

The stay level data file is as below:

data stayfile;
infile datalines truncover dsd;
input Person_ID Stay_ID Stay_from_dt :mmddyy. Stay_Thru_dt :mmddyy.;
format Stay_from_dt Stay_Thru_dt mmddyy.;
datalines;
1,1,4/17/2017,12/31/2018
2,1,7/20/2017,8/5/2017
2,2,6/10/2018,6/29/2018
2,3,8/5/2018,12/31/2018
;
run;

 

The target data file is as below:

data want;
infile datalines truncover dsd;
input Person_ID Stay_ID Stay_from_dt :mmddyy. Stay_Thru_dt :mmddyy. flag_dx;
format Stay_from_dt Stay_Thru_dt mmddyy.;
datalines;
1,1,4/17/2017,12/31/2018,0
2,1,7/20/2017,8/5/2017,0
2,2,6/10/2018,6/29/2018,0
2,3,8/5/2018,12/31/2018,1
;
run;

Any suggestion is greatly appreciated!

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

@lichee I see 🙂 Try this instead. Let me know if it works for you.

 

data want(drop = clm_beg_dt clm_end_dt flag_dx);

   if _N_ = 1 then do;
      dcl hash h(dataset : 'clmfile', multidata : 'Y');
      h.definekey('Person_ID');
      h.definedata(all : 'Y');
      h.definedone();
   end;

   set stayfile;

   if 0 then set clmfile;
   call missing(clm_beg_dt, clm_end_dt, flag_dx);
   flag = 0;

   do while (h.do_over() = 0);
      if clm_beg_dt >= Stay_from_dt 
     and clm_end_dt <= Stay_Thru_dt 
     and flag_dx = 1 then do;
        flag = 1;
      end;
   end;

run;

View solution in original post

10 REPLIES 10
lichee
Quartz | Level 8
I was thinking one possible way is to join the two datasets by Person_ID and then compare the two sets of dates. However, the join would be many-to-many join, which is not efficient.
PeterClemmensen
Tourmaline | Level 20

What if one of the observations in the claim data set has flag 0 for just 1 of the 'in-between' obs?

 

Like the obs below. What if flag was 0 instead?

 

2,8/19/2018,9/30/2018,1

lichee
Quartz | Level 8
I believe you meant the case as below. Instead of having 1 all the way through for the time period of stay_ID 3, Person 2 had some 0s during its Stay 3 between 8/5/2018 and 12/31/2018. This case I only care about if flag_dx=1 ever showed. If flag_dx=1 ever showed once during a stay within the stay time period, the stay for the person should have flag_dx=1 in the target data set.
2,8/2/2018,8/5/2018,1
2,8/5/2018,8/14/2018,1
2,8/14/2018,8/19/2018,0
2,8/19/2018,8/31/2018,1
2,8/19/2018,9/30/2018,1
2,8/19/2018,10/31/2018,0
2,8/19/2018,11/30/2022,1
2,8/19/2018,12/31/2018,1
2,9/1/2018,9/5/2018,1
Thanks a lot!
PeterClemmensen
Tourmaline | Level 20

Ok. And to fully understand: The date interval in the claims data must be completely enclosed by the date interval in stayfile, right? Meaning that clm_beg_dt >= Stay_from_dt and clm_end_dt <= Stay_Thru_dt ?

lichee
Quartz | Level 8
That's right. I should have indicated that in my initial ask. Stay_from_dt<=clm_beg_dt<= clm_end_dt<=Stay_Thru_dt
PeterClemmensen
Tourmaline | Level 20

Ok. Try this

 

data want(drop = clm_beg_dt clm_end_dt flag_dx);

   if _N_ = 1 then do;
      dcl hash h(dataset : 'clmfile', multidata : 'Y');
      h.definekey('Person_ID');
      h.definedata(all : 'Y');
      h.definedone();
   end;

   set stayfile;

   if 0 then set clmfile;
   call missing(clm_beg_dt, clm_end_dt, flag_dx);
   flag = 0;

   do while (h.do_over() = 0);
      if clm_beg_dt >= Stay_from_dt 
     and clm_end_dt <= Stay_Thru_dt 
     and flag_dx = 1 then do;
        flag = 1;
        leave;
      end;
   end;

run;

 

Result:

 

Person_ID  Stay_ID  Stay_from_dt  Stay_Thru_dt  flag
1          1        04/17/17      12/31/18      0
2          1        07/20/17      08/05/17      0
2          2        06/10/18      06/29/18      0
2          3        08/05/18      12/31/18      1

 

lichee
Quartz | Level 8

Thank you, Peter!
I tried out the code with real data. When there are Person_IDs with flag_dx=0 following Person_IDs with flag_dx=1, the Person_IDs with flag_dx=0 would be replaced with Person_IDs with flag_dx=1.
For example, when there is third person, Person_ID=3, with flag_dx=0, Person_ID would become Person_ID=2. The data below would have an unexpected output using the code you provided. Any further insight?

 

data clmfile;
infile datalines truncover dsd;
input Person_ID clm_beg_dt :mmddyy. clm_end_dt :mmddyy. flag_dx;
format clm_beg_dt clm_end_dt mmddyy.;
datalines;
1,4/17/2017,11/30/2017,0
1,4/17/2017,12/31/2017,0
1,4/17/2017,4/30/2018,0
1,4/17/2017,6/30/2018,0
1,4/17/2017,8/31/2018,0
1,4/17/2017,10/31/2018,0
1,4/17/2017,12/31/2018,0
2,7/17/2017,7/20/2017,0
2,7/20/2017,7/31/2017,0
2,8/1/2017,8/5/2017,0
2,6/7/2018,6/10/2018,0
2,8/2/2018,8/5/2018,1
2,8/5/2018,8/14/2018,1
2,8/14/2018,8/19/2018,1
2,8/19/2018,8/31/2018,1
2,8/19/2018,9/30/2018,1
2,8/19/2018,10/31/2018,1
2,8/19/2018,11/30/2022,1
2,8/19/2018,12/31/2018,1
2,9/1/2018,9/5/2018,1
3,12/7/2017,12/12/2017,0
3,12/12/2017,12/31/2017,0
3,1/1/2018,1/16/2018,0
;
run;

data stayfile;
infile datalines truncover dsd;
input Person_ID Stay_ID Stay_from_dt :mmddyy. Stay_Thru_dt :mmddyy.;
format Stay_from_dt Stay_Thru_dt mmddyy.;
datalines;
1,1,4/17/2017,12/31/2018
2,1,7/20/2017,8/5/2017
2,2,6/10/2018,6/29/2018
2,3,8/5/2018,12/31/2018
3,1,12/12/2017,1/16/2018
;
run;

PeterClemmensen
Tourmaline | Level 20

@lichee I see 🙂 Try this instead. Let me know if it works for you.

 

data want(drop = clm_beg_dt clm_end_dt flag_dx);

   if _N_ = 1 then do;
      dcl hash h(dataset : 'clmfile', multidata : 'Y');
      h.definekey('Person_ID');
      h.definedata(all : 'Y');
      h.definedone();
   end;

   set stayfile;

   if 0 then set clmfile;
   call missing(clm_beg_dt, clm_end_dt, flag_dx);
   flag = 0;

   do while (h.do_over() = 0);
      if clm_beg_dt >= Stay_from_dt 
     and clm_end_dt <= Stay_Thru_dt 
     and flag_dx = 1 then do;
        flag = 1;
      end;
   end;

run;
lichee
Quartz | Level 8
This one works well! I'll need to do some reading to learn about hash object now. Thanks a lot!
PeterClemmensen
Tourmaline | Level 20

Sure thing. If it works, please mark the code above the accepted answer 🙂

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2236 views
  • 0 likes
  • 2 in conversation