Solved: Re: Summarizing records that the totals can overlap

shounster · Posted 09-01-2020 03:07 PM

I am coming back to the well for the continuation of an issue I posed last week.

I am looking to summarize/rollup data that will have totals that will not match the number of rows.

data have; 
input pat_id $ epiid $ epigrp $ drg $ clm_from_dt :mmddyy10. clm_thru_dt :mmddyy10.; 
format clm_from_dt clm_thru_dt mmddyy10. anch_flg $ readm_flg $; 
datalines; 
88575 16 16 389 11/22/2018 11/30/2018 y n
88575 16 34 387 01/05/2019 01/14/2019 n y
88575 16 34 387 01/14/2019 01/19/2019 n y
88575 19 19 330 02/27/2019 03/08/2019 y y
88575 20 00 394 05/21/2019 05/28/2019 y y
88575 20 34 386 06/16/2019 06/22/2019 n y
; 
run;

In this sample, you can see the single patient and how many visits (rows) they had during this time period. What I would like the results to show are this:

data want; 
input pat_id $ epiid $ epigrp $ drg $ clm_from_dt :mmddyy10. clm_thru_dt :mmddyy10.; 
format clm_from_dt clm_thru_dt mmddyy10. anch_flg $ readm_count 3.; 
datalines; 
88575 16 16 389 11/22/2018 11/28/2018 y 3 
88575 19 19 330 02/27/2019 03/08/2019 y 2
88575 20 00 394 05/21/2019 05/28/2019 y 2
; 
run;

I have created a date variable that retains the date until there is another anch_flg='y'. However, this variable resets on each new anch_flg before I can get the correct readmit count. The readm_flg is created whenever there is a prior record that is within 90 days. There can be an anchor line that is also a readmit, which is causing my issues. I can't seem to get it to 'double count'.

In looking at the two tables above, you can see the first anchor has four lines that are summed (the first is an anchor, but no readmit; the next two are readmits within 90 days of the last anchor clm_thru_dt; the fourth row, while a new anchor is also a readmit since 02/27/19 is within 90 days of 11/30/18.

I thought about doing two data steps to create the merge table. The first gets all of the anchor flag records then the next would be a freq or summary to obtain the count of rehab flags by some kind of key (pat_id ~ epi_id), but I'm not getting the right counts when I summarize and then merge.

Any ideas would be greatly appreciated.

FloydNevseta · Posted 09-03-2020 09:53 AM

As you pointed out, the tricky part here is that you have rows that count not only for the current anchor but for the previous anchor too. You certainly cannot process that in one pass of the data. My thought is to read the dataset twice using 2 set statements: the usual one in the data step to detect the anchor and a second one in a do loop that reads the obs after the anchor until all of the contiguous readmissions are counted. Here's what the data step looks like.

data want (drop=readm_flg: anch_flg2);
set have;

* only take action on anchor obs;
* this will preserve values of the anchor;
if anch_flg = 'y' then do;
  next_obs = _n_ + 1;
  readm_count = (readm_flg = 'y'); * reset readmit count to 0 or 1;

  * this loop evaluates obs after anchor until next anchor or
  * no readmission;
  do until (readm_flg2 = 'n' or anch_flg2 = 'y');
    
    * start reading with obs after the anchor (point=next_obs);
    set have (rename=(anch_flg=anch_flg2 readm_flg=readm_flg2)
        keep=anch_flg readm_flg) point=next_obs nobs=nobs;
    if readm_flg2 = 'y' then readm_count + 1;
    next_obs + 1; * prepare for next obs;

    * with direct access (point=), must manually detect reading 
    * past eof or other errors. cannot use end= with point=;
    if next_obs > nobs or _error_ then leave;
  end;
  output; * all contiguous readmissions have been counted so output;
end;
run;

It produces your desired output with the anchor row data preserved.

View solution in original post

PaigeMiller · Posted 09-01-2020 03:15 PM

There can be an anchor line that is also a readmit, which is causing my issues. I can't seem to get it to 'double count'.

Can't you simply do something like this?

if anch_flg='y' then count=1;

if readm_flg='y' then count+1;

--
Paige Miller

shounster · Posted 09-01-2020 04:01 PM

When add the count to the records, I get 1,2,3,1,2,3 sequentially, (for the detail, first file above). If I created a summary, I would have a count of six. In this particular example, I can't have a count of one for the first record since there isn't a readmit. If my layout and explanation is not good, I apologize.

PaigeMiller · Posted 09-01-2020 04:09 PM

Yes, the code I provided gives sequential counts. Then you can write a little more code to keep the last record, which will match your WANT data set exactly.

--
Paige Miller

shounster · Posted 09-02-2020 08:26 AM

If I do something like this:

data MY4SEL.&clnt._readx;                                                                                                               
 set MY4SEL.&clnt._intgap_epi3;                                                                                                         
by pat_id;                                                                                                                              
                                                                                                                                        
if first.pat_id then do;                                                                                                                
 if anch_flg='y' then count=1;                                                                                                          
 if readm_flg='y' then count+1;                                                                                                         
end;                                                                                                                                    
run;

I'm getting the same 1,2,3,1,2,3 as we talked about yesterday, but I guess I'm not understanding how to get these counts to line up with the appropriate anchors as shown in the "want" file.

PaigeMiller · Posted 09-02-2020 08:31 AM

There are probably more efficient ways to do this, but this will work

data have; 
input pat_id $ epiid $ epigrp $ drg $ clm_from_dt :mmddyy10. clm_thru_dt :mmddyy10. anch_flg $ readm_flg $; 
format clm_from_dt clm_thru_dt mmddyy10. ; 
datalines; 
88575 16 16 389 11/22/2018 11/30/2018 y n
88575 16 34 387 01/05/2019 01/14/2019 n y
88575 16 34 387 01/14/2019 01/19/2019 n y
88575 19 19 330 02/27/2019 03/08/2019 y y
88575 20 00 394 05/21/2019 05/28/2019 y y
88575 20 34 386 06/16/2019 06/22/2019 n y
; 
run; 
data want1;
	set have;
	if anch_flg='y' then do; group+1; count=1; end;
	if readm_flg='y' then count+1;
run;
data want2;
	set want1;
	by group;
	if last.group;
run;

Side comment: if you use numeric 0 and 1 instead of character 'n' and 'y', this whole operation is a lot simpler, as you simply now have to sum the 0s and 1s, and sum is much easier by groups, using PROC SUMMARY or other PROC. In general, using 0s and 1s instead of character values is a good practice.

--
Paige Miller

shounster · Posted 09-02-2020 08:57 AM

That helps immensely. And I appreciate it.

However, the rows that need to be returned in the 'want' set need to be the rows where anch_flg='y'. I guess that's where I was getting stuck. The only thing I have to make a somewhat unique key with is pat_id and epiid. Would it make sense to do the above and then merge another subset of "where anch_flg=y" by said key?

I do get your point re: using '0' and '1' instead of 'n' and 'y'. Definitely agree that is a better practice.

shounster · Posted 09-02-2020 11:00 AM

data have; 
input pat_id $ epiid $ epigrp $ drg $ clm_from_dt :mmddyy10. clm_thru_dt new_anch_thru_dt :mmddyy10.; 
format clm_from_dt clm_thru_dt mmddyy10. anch_flg $ readm_flg $; 
datalines; 
88575 16 16 389 11/22/2018 11/30/2018 11/30/2018 y n
88575 16 34 387 01/05/2019 01/14/2019 11/30/2018 n y
88575 16 34 387 01/14/2019 01/19/2019 11/30/2018 n y
88575 19 19 330 02/27/2019 03/08/2019 03/08/2019 y y
88575 20 00 394 05/21/2019 05/28/2019 05/28/2019 y y
88575 20 34 386 06/16/2019 06/22/2019 05/28/2019 n y
; 
run;

Maybe adding the new_anch_thru_dt (the retained date of when there is an occurrence of an anchor record) helps with the number of readmits within 90 days of an anchor. I've tried sql self joins, intnx to compare dates, nothing seems to get the right counts. I can get 2 for the first anchor, but the second anchor (line four, continues to show a count of 1). Thanks again, all.

FloydNevseta · Posted 09-03-2020 09:53 AM

As you pointed out, the tricky part here is that you have rows that count not only for the current anchor but for the previous anchor too. You certainly cannot process that in one pass of the data. My thought is to read the dataset twice using 2 set statements: the usual one in the data step to detect the anchor and a second one in a do loop that reads the obs after the anchor until all of the contiguous readmissions are counted. Here's what the data step looks like.

data want (drop=readm_flg: anch_flg2);
set have;

* only take action on anchor obs;
* this will preserve values of the anchor;
if anch_flg = 'y' then do;
  next_obs = _n_ + 1;
  readm_count = (readm_flg = 'y'); * reset readmit count to 0 or 1;

  * this loop evaluates obs after anchor until next anchor or
  * no readmission;
  do until (readm_flg2 = 'n' or anch_flg2 = 'y');
    
    * start reading with obs after the anchor (point=next_obs);
    set have (rename=(anch_flg=anch_flg2 readm_flg=readm_flg2)
        keep=anch_flg readm_flg) point=next_obs nobs=nobs;
    if readm_flg2 = 'y' then readm_count + 1;
    next_obs + 1; * prepare for next obs;

    * with direct access (point=), must manually detect reading 
    * past eof or other errors. cannot use end= with point=;
    if next_obs > nobs or _error_ then leave;
  end;
  output; * all contiguous readmissions have been counted so output;
end;
run;

It produces your desired output with the anchor row data preserved.

Classroom Training Available!