BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
lichee
Quartz | Level 8

Hi all,

I have two data sets, one is claim level data, and the other is (long-term or inpatient) stay level data. I wanted to construct a new field for the stay level data based on the claim level data. For example, when a diagnosis (flag_dx) showed up in a claim during the claim service dates (clm_beg_dt and clm_end_dt), and if the claim service dates are within the stay time window, the stay should take a value 1. Otherwise, the stay has the value of 0.

 The claim level data is as below:

data clmfile;
infile datalines truncover dsd;
input Person_ID clm_beg_dt :mmddyy. clm_end_dt :mmddyy. flag_dx;
format clm_beg_dt clm_end_dt mmddyy.;
datalines;
1,4/17/2017,11/30/2017,0
1,4/17/2017,12/31/2017,0
1,4/17/2017,4/30/2018,0
1,4/17/2017,6/30/2018,0
1,4/17/2017,8/31/2018,0
1,4/17/2017,10/31/2018,0
1,4/17/2017,12/31/2018,0
2,7/17/2017,7/20/2017,0
2,7/20/2017,7/31/2017,0
2,8/1/2017,8/5/2017,0
2,6/7/2018,6/10/2018,0
2,8/2/2018,8/5/2018,1
2,8/5/2018,8/14/2018,1
2,8/14/2018,8/19/2018,1
2,8/19/2018,8/31/2018,1
2,8/19/2018,9/30/2018,1
2,8/19/2018,10/31/2018,1
2,8/19/2018,11/30/2022,1
2,8/19/2018,12/31/2018,1
2,9/1/2018,9/5/2018,1
;
run;

The stay level data file is as below:

data stayfile;
infile datalines truncover dsd;
input Person_ID Stay_ID Stay_from_dt :mmddyy. Stay_Thru_dt :mmddyy.;
format Stay_from_dt Stay_Thru_dt mmddyy.;
datalines;
1,1,4/17/2017,12/31/2018
2,1,7/20/2017,8/5/2017
2,2,6/10/2018,6/29/2018
2,3,8/5/2018,12/31/2018
;
run;

 

The target data file is as below:

data want;
infile datalines truncover dsd;
input Person_ID Stay_ID Stay_from_dt :mmddyy. Stay_Thru_dt :mmddyy. flag_dx;
format Stay_from_dt Stay_Thru_dt mmddyy.;
datalines;
1,1,4/17/2017,12/31/2018,0
2,1,7/20/2017,8/5/2017,0
2,2,6/10/2018,6/29/2018,0
2,3,8/5/2018,12/31/2018,1
;
run;

Any suggestion is greatly appreciated!

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

@lichee I see 🙂 Try this instead. Let me know if it works for you.

 

data want(drop = clm_beg_dt clm_end_dt flag_dx);

   if _N_ = 1 then do;
      dcl hash h(dataset : 'clmfile', multidata : 'Y');
      h.definekey('Person_ID');
      h.definedata(all : 'Y');
      h.definedone();
   end;

   set stayfile;

   if 0 then set clmfile;
   call missing(clm_beg_dt, clm_end_dt, flag_dx);
   flag = 0;

   do while (h.do_over() = 0);
      if clm_beg_dt >= Stay_from_dt 
     and clm_end_dt <= Stay_Thru_dt 
     and flag_dx = 1 then do;
        flag = 1;
      end;
   end;

run;

View solution in original post

10 REPLIES 10
lichee
Quartz | Level 8
I was thinking one possible way is to join the two datasets by Person_ID and then compare the two sets of dates. However, the join would be many-to-many join, which is not efficient.
PeterClemmensen
Tourmaline | Level 20

What if one of the observations in the claim data set has flag 0 for just 1 of the 'in-between' obs?

 

Like the obs below. What if flag was 0 instead?

 

2,8/19/2018,9/30/2018,1

lichee
Quartz | Level 8
I believe you meant the case as below. Instead of having 1 all the way through for the time period of stay_ID 3, Person 2 had some 0s during its Stay 3 between 8/5/2018 and 12/31/2018. This case I only care about if flag_dx=1 ever showed. If flag_dx=1 ever showed once during a stay within the stay time period, the stay for the person should have flag_dx=1 in the target data set.
2,8/2/2018,8/5/2018,1
2,8/5/2018,8/14/2018,1
2,8/14/2018,8/19/2018,0
2,8/19/2018,8/31/2018,1
2,8/19/2018,9/30/2018,1
2,8/19/2018,10/31/2018,0
2,8/19/2018,11/30/2022,1
2,8/19/2018,12/31/2018,1
2,9/1/2018,9/5/2018,1
Thanks a lot!
PeterClemmensen
Tourmaline | Level 20

Ok. And to fully understand: The date interval in the claims data must be completely enclosed by the date interval in stayfile, right? Meaning that clm_beg_dt >= Stay_from_dt and clm_end_dt <= Stay_Thru_dt ?

lichee
Quartz | Level 8
That's right. I should have indicated that in my initial ask. Stay_from_dt<=clm_beg_dt<= clm_end_dt<=Stay_Thru_dt
PeterClemmensen
Tourmaline | Level 20

Ok. Try this

 

data want(drop = clm_beg_dt clm_end_dt flag_dx);

   if _N_ = 1 then do;
      dcl hash h(dataset : 'clmfile', multidata : 'Y');
      h.definekey('Person_ID');
      h.definedata(all : 'Y');
      h.definedone();
   end;

   set stayfile;

   if 0 then set clmfile;
   call missing(clm_beg_dt, clm_end_dt, flag_dx);
   flag = 0;

   do while (h.do_over() = 0);
      if clm_beg_dt >= Stay_from_dt 
     and clm_end_dt <= Stay_Thru_dt 
     and flag_dx = 1 then do;
        flag = 1;
        leave;
      end;
   end;

run;

 

Result:

 

Person_ID  Stay_ID  Stay_from_dt  Stay_Thru_dt  flag
1          1        04/17/17      12/31/18      0
2          1        07/20/17      08/05/17      0
2          2        06/10/18      06/29/18      0
2          3        08/05/18      12/31/18      1

 

lichee
Quartz | Level 8

Thank you, Peter!
I tried out the code with real data. When there are Person_IDs with flag_dx=0 following Person_IDs with flag_dx=1, the Person_IDs with flag_dx=0 would be replaced with Person_IDs with flag_dx=1.
For example, when there is third person, Person_ID=3, with flag_dx=0, Person_ID would become Person_ID=2. The data below would have an unexpected output using the code you provided. Any further insight?

 

data clmfile;
infile datalines truncover dsd;
input Person_ID clm_beg_dt :mmddyy. clm_end_dt :mmddyy. flag_dx;
format clm_beg_dt clm_end_dt mmddyy.;
datalines;
1,4/17/2017,11/30/2017,0
1,4/17/2017,12/31/2017,0
1,4/17/2017,4/30/2018,0
1,4/17/2017,6/30/2018,0
1,4/17/2017,8/31/2018,0
1,4/17/2017,10/31/2018,0
1,4/17/2017,12/31/2018,0
2,7/17/2017,7/20/2017,0
2,7/20/2017,7/31/2017,0
2,8/1/2017,8/5/2017,0
2,6/7/2018,6/10/2018,0
2,8/2/2018,8/5/2018,1
2,8/5/2018,8/14/2018,1
2,8/14/2018,8/19/2018,1
2,8/19/2018,8/31/2018,1
2,8/19/2018,9/30/2018,1
2,8/19/2018,10/31/2018,1
2,8/19/2018,11/30/2022,1
2,8/19/2018,12/31/2018,1
2,9/1/2018,9/5/2018,1
3,12/7/2017,12/12/2017,0
3,12/12/2017,12/31/2017,0
3,1/1/2018,1/16/2018,0
;
run;

data stayfile;
infile datalines truncover dsd;
input Person_ID Stay_ID Stay_from_dt :mmddyy. Stay_Thru_dt :mmddyy.;
format Stay_from_dt Stay_Thru_dt mmddyy.;
datalines;
1,1,4/17/2017,12/31/2018
2,1,7/20/2017,8/5/2017
2,2,6/10/2018,6/29/2018
2,3,8/5/2018,12/31/2018
3,1,12/12/2017,1/16/2018
;
run;

PeterClemmensen
Tourmaline | Level 20

@lichee I see 🙂 Try this instead. Let me know if it works for you.

 

data want(drop = clm_beg_dt clm_end_dt flag_dx);

   if _N_ = 1 then do;
      dcl hash h(dataset : 'clmfile', multidata : 'Y');
      h.definekey('Person_ID');
      h.definedata(all : 'Y');
      h.definedone();
   end;

   set stayfile;

   if 0 then set clmfile;
   call missing(clm_beg_dt, clm_end_dt, flag_dx);
   flag = 0;

   do while (h.do_over() = 0);
      if clm_beg_dt >= Stay_from_dt 
     and clm_end_dt <= Stay_Thru_dt 
     and flag_dx = 1 then do;
        flag = 1;
      end;
   end;

run;
lichee
Quartz | Level 8
This one works well! I'll need to do some reading to learn about hash object now. Thanks a lot!
PeterClemmensen
Tourmaline | Level 20

Sure thing. If it works, please mark the code above the accepted answer 🙂

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2401 views
  • 0 likes
  • 2 in conversation