Solved: Re: Continuos Enrollment from claims data

Charisma · Posted 07-20-2019 11:46 PM

Hello,

Can someone please help me with this?

I am trying to determine patient IDs that are continuously enrolled defined as having at least one claim in each quarter. The dataset year runs from july 2013 to june 2014. What I did/WANT was to divide the year into 4 quarters of three months each and keep patient with at least one claim in each quarter. But manual check of the output dataset revealed that not all patients with this criteria was retained.

Sample dataset below:

data WORK.SAMPLE2013_2014;

infile datalines dsd truncover;

input DE_ID:9. SDATE:5. YYYYMMDD:8. LN:$30.;

format DE_ID 9. SDATE 5. YYYYMMDD 8.;

label DE_ID="DE_ID" SDATE="SDATE" YYYYMMDD="YYYYMMDD" LN="LN";

datalines;

10478 16755 20130715 ACETAMINOPHN

10478 16820 20130918 ACETAMINOPHN

10478 16947 20140123 ACETAMINOPHN

10478 16997 20140314 ACETAMINOPHEN-C

10478 17010 20140327 ACETAMINOPHEN-C

10478 17015 20140401 ACETAMINOPHN

10479 16741 20130701 ACETAMINOPHEN

10479 16741 20130701 HYDRO

10479 16786 20130815 HYDRO

10479 16786 20130815 ACETAMINOPHEN

10479 16811 20130909 HYDRO

10479 16812 20130910 ACETAMINOPHEN

10479 16850 20131018 ACETAMINOPHEN

10479 16850 20131018 HYDRO

10480 16796 20130825 TRA

10480 16825 20130923 TRA

10480 16853 20131021 TRA

10480 16890 20131127 TRA

10480 16907 20131214 TRA

10480 16938 20140114 TRA

10480 16958 20140203 TRA

10480 16989 20140306 TRA

10480 17021 20140407 TRA

10480 17045 20140501 TRA

10480 17073 20140529 TRA

10480 17100 20140625 TRA

10485 17030 20140416 HYDRO

10488 16742 20130702 HYDRO

10489 16742 20130702 HYDRO

10489 16777 20130806 HYDRO

10489 16808 20130906 HYDRO

10489 16839 20131007 HYDRO

10489 16868 20131105 HYDRO

10489 16896 20131203 HYDRO

10489 16926 20140102 HYDRO

10489 16949 20140125 HYDRO

10489 16988 20140305 HYDRO

10489 17002 20140319 HYDRO

10489 17028 20140414 HYDRO

10489 17059 20140515 HYDRO

10489 17078 20140603 TRA

10489 17088 20140613 TRA

10489 17088 20140613 HYDRO

The code I tried

/*Determination of continuous enrollment*/

data Cont_enr1314; set AGEREST1314;qtr= floor(sdate/91.25); run;

proc transpose data=Cont_enr1314 out=elig_wide1314 prefix=qtr; by de_id; var qtr;run;

data elig_real1314; set elig_wide1314;

array qt(*) qtr1-qtr126;

do i=1 to 125;

if qt(i+1)-qt(i)>1 then delete;

end;

do i=1 to 126;

if qt(i)^=. then qlast=qt(i);

end;

range=qlast-qt(1);

if range<4 then delete;

run;

/*What was done here was that every patient had to be eligible for at least one month in every quarter over a period of 4 quarters*/

proc sql; create table e_r1314 as select unique elig_real1314.de_id from elig_real1314; quit;

Manual check of the data should have retained IDS 10478, 10480, AND 10489. But the output I got with this code only retained ID 10489.

Can someone please tell me what is wrong or a more efficient way of doing this.

Thank you.

Patrick · Posted 07-21-2019 06:15 AM

Welcome to the SAS Communities!

Ensure to read source date strings always with a SAS Date informat so that they translate into a SAS Date value (count of days since 1/1/1960) and not just a string or a number. I've modified the input statement you've posted so that this happens.

Once you've got SAS Date values you can use SAS calendar functions (intck in below code).

Below a code version which determines the DE_ID's with 4 continues claim quarters.

data have;
  infile datalines dsd truncover dlm=' ' dsd;
  input DE_ID:$9. SDATE:best32. YYYYMMDD:yymmdd8. LN:$30.;
  format SDATE date9. YYYYMMDD yymmddn8.;
  datalines;
10478 16755 20130715 ACETAMINOPHN
10478 16820 20130918 ACETAMINOPHN
10478 16947 20140123 ACETAMINOPHN
10478 16997 20140314 ACETAMINOPHEN-C
10478 17010 20140327 ACETAMINOPHEN-C
10478 17015 20140401 ACETAMINOPHN
10479 16741 20130701 ACETAMINOPHEN
10479 16741 20130701 HYDRO
10479 16786 20130815 HYDRO
10479 16786 20130815 ACETAMINOPHEN
10479 16811 20130909 HYDRO
10479 16812 20130910 ACETAMINOPHEN
10479 16850 20131018 ACETAMINOPHEN
10479 16850 20131018 HYDRO
10480 16796 20130825 TRA
10480 16825 20130923 TRA
10480 16853 20131021 TRA
10480 16890 20131127 TRA
10480 16907 20131214 TRA
10480 16907 20131214 TRA
10480 16938 20140114 TRA
10480 16958 20140203 TRA
10480 16989 20140306 TRA
10480 17021 20140407 TRA
10480 17045 20140501 TRA
10480 17073 20140529 TRA
10480 17100 20140625 TRA
10485 17030 20140416 HYDRO
10488 16742 20130702 HYDRO
10489 16742 20130702 HYDRO
10489 16777 20130806 HYDRO
10489 16808 20130906 HYDRO
10489 16839 20131007 HYDRO
10489 16868 20131105 HYDRO
10489 16896 20131203 HYDRO
10489 16926 20140102 HYDRO
10489 16949 20140125 HYDRO
10489 16988 20140305 HYDRO
10489 17002 20140319 HYDRO
10489 17028 20140414 HYDRO
10489 17059 20140515 HYDRO
10489 17078 20140603 TRA
10489 17088 20140613 TRA
10489 17088 20140613 HYDRO
;
run;

data want(keep=de_id);
  set have;
  by de_id sdate;

  /* _prev_sdate: sdate value from previous row       */
  /* - lag function may not be within an IF condition */
  _prev_sdate=lag(sdate);

  /* initialize vars for a new de_id */
  if first.de_id then
    do;
      _prev_sdate=.;
      _n_cont_quarters=1;
    end;

  /* do not process rows if already a match found within a de_id */
  if _n_cont_quarters=4 then delete;

  /* if more than one quarter between dates then set _n_cont_quarters to 1, else add 1 */
  retain _n_cont_quarters;
  select(intck('quarter',_prev_sdate,sdate));
    when(0)     /* do nothing */;
    when(1)     _n_cont_quarters=sum(_n_cont_quarters,1);
    otherwise   _n_cont_quarters=1;
  end;

  /* output first row with 4 continues quarters */
  if _n_cont_quarters=4 then output;
run;

proc print;
run;

View solution in original post

Patrick · Posted 07-21-2019 06:15 AM

Welcome to the SAS Communities!

Ensure to read source date strings always with a SAS Date informat so that they translate into a SAS Date value (count of days since 1/1/1960) and not just a string or a number. I've modified the input statement you've posted so that this happens.

Once you've got SAS Date values you can use SAS calendar functions (intck in below code).

Below a code version which determines the DE_ID's with 4 continues claim quarters.

data have;
  infile datalines dsd truncover dlm=' ' dsd;
  input DE_ID:$9. SDATE:best32. YYYYMMDD:yymmdd8. LN:$30.;
  format SDATE date9. YYYYMMDD yymmddn8.;
  datalines;
10478 16755 20130715 ACETAMINOPHN
10478 16820 20130918 ACETAMINOPHN
10478 16947 20140123 ACETAMINOPHN
10478 16997 20140314 ACETAMINOPHEN-C
10478 17010 20140327 ACETAMINOPHEN-C
10478 17015 20140401 ACETAMINOPHN
10479 16741 20130701 ACETAMINOPHEN
10479 16741 20130701 HYDRO
10479 16786 20130815 HYDRO
10479 16786 20130815 ACETAMINOPHEN
10479 16811 20130909 HYDRO
10479 16812 20130910 ACETAMINOPHEN
10479 16850 20131018 ACETAMINOPHEN
10479 16850 20131018 HYDRO
10480 16796 20130825 TRA
10480 16825 20130923 TRA
10480 16853 20131021 TRA
10480 16890 20131127 TRA
10480 16907 20131214 TRA
10480 16907 20131214 TRA
10480 16938 20140114 TRA
10480 16958 20140203 TRA
10480 16989 20140306 TRA
10480 17021 20140407 TRA
10480 17045 20140501 TRA
10480 17073 20140529 TRA
10480 17100 20140625 TRA
10485 17030 20140416 HYDRO
10488 16742 20130702 HYDRO
10489 16742 20130702 HYDRO
10489 16777 20130806 HYDRO
10489 16808 20130906 HYDRO
10489 16839 20131007 HYDRO
10489 16868 20131105 HYDRO
10489 16896 20131203 HYDRO
10489 16926 20140102 HYDRO
10489 16949 20140125 HYDRO
10489 16988 20140305 HYDRO
10489 17002 20140319 HYDRO
10489 17028 20140414 HYDRO
10489 17059 20140515 HYDRO
10489 17078 20140603 TRA
10489 17088 20140613 TRA
10489 17088 20140613 HYDRO
;
run;

data want(keep=de_id);
  set have;
  by de_id sdate;

  /* _prev_sdate: sdate value from previous row       */
  /* - lag function may not be within an IF condition */
  _prev_sdate=lag(sdate);

  /* initialize vars for a new de_id */
  if first.de_id then
    do;
      _prev_sdate=.;
      _n_cont_quarters=1;
    end;

  /* do not process rows if already a match found within a de_id */
  if _n_cont_quarters=4 then delete;

  /* if more than one quarter between dates then set _n_cont_quarters to 1, else add 1 */
  retain _n_cont_quarters;
  select(intck('quarter',_prev_sdate,sdate));
    when(0)     /* do nothing */;
    when(1)     _n_cont_quarters=sum(_n_cont_quarters,1);
    otherwise   _n_cont_quarters=1;
  end;

  /* output first row with 4 continues quarters */
  if _n_cont_quarters=4 then output;
run;

proc print;
run;

Charisma · Posted 07-21-2019 03:07 PM

Thank you so much Patrick!!!!!!! It worked.

Quick question: when I used the format statement code below:

data want; format sdate date9. YYYYMMDD yymmdd8.; set have;run;

I got an error message:

There was a problem with the Format so BEST. was used.

Why is that?

Thank you for your time:)

Patrick · Posted 07-21-2019 06:50 PM

@Charisma wrote:

Thank you so much Patrick!!!!!!! It worked.

Quick question: when I used the format statement code below:

data want; format sdate date9. YYYYMMDD yymmdd8.; set have;run;

I got an error message:

There was a problem with the Format so BEST. was used.

Why is that?

Thank you for your time:)

I'm not sure what you're referring to. There isn't such an Error/Warning with the code I've posted and I didn't find a way to replicate the issue you're raising.

28         data have;
29           infile datalines dsd truncover dlm=' ' dsd;
30           input DE_ID:$9. SDATE:best32. YYYYMMDD:yymmdd8. LN:$30.;
31           format SDATE date9. YYYYMMDD yymmddn8.;
32           datalines;

NOTE: The data set WORK.HAVE has 44 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds

Charisma · Posted 07-21-2019 07:19 PM

Thank you so much for your prompt reply Patrick, I no longer have the error:)

hashman · Posted 07-21-2019 11:25 PM

@Charisma:

Frankly, from your description of the problem and your sample data set I could hardly understand the nature of the task. Luckily, @Patrick, obviously much more perceptive than me, apparently understood it correctly, judging from the fact that you've picked his solution as correct. Since it follows from his analysis that you don't need any variables except DE_ID and SDATE to attain the goal, the extraneous variables YYYYMMD and LN in your input do nothing but muddy the waters. If you would like to be helped with SAS programming, please do a favor to those willing to assist and don't make their efforts more strenuous by adding the job of understanding your problem to the job of helping you solve it.

As to the task itself (again, I've managed to understand its gist only thanks to @Patrick and his program), it is one of those that easily succumb to the key-indexing approach I privately call "the paint brush method". In fact, in this case it's even more so because with regard to continuous enrollment we normally have to deal with days (and so the key-indexed table have to accommodate the range of all possible dates), while in your case it's much easier because we only have to accommodate a range of quarters. There are only about 4000 quarters between the earliest valid SAS date, 01jan1582, and year 01jan2582, which is why 4000 cells in the key-indexed table (whether they are array items, bitmap bits, or character string bytes) is enough to cover all bases. Below, the simplest form of a key-indexed table is chosen as a character variable KX $4000.

If your input file is sorted (or grouped) by DE_ID, as in your data sample, the "paint brush" solution is as simple as:

data have ;                                                   
  input DE_ID :$9. sdate ;                                    
  cards ;                                                     
10478 16755                                                   
10478 16820                                                   
10478 16947                                                   
10478 16997                                                   
10478 17010                                                   
10478 17015                                                   
10479 16741                                                   
10479 16741                                                   
10479 16786                                                   
10479 16786                                                   
10479 16811                                                   
10479 16812                                                   
10479 16850                                                   
10479 16850                                                   
10480 16796                                                   
10480 16825                                                   
10480 16853                                                   
10480 16890                                                   
10480 16907                                                   
10480 16907                                                   
10480 16938                                                   
10480 16958                                                   
10480 16989                                                   
10480 17021                                                   
10480 17045                                                   
10480 17073                                                   
10480 17100                                                   
10485 17030                                                   
10488 16742                                                   
10489 16742                                                   
10489 16777                                                   
10489 16808                                                   
10489 16839                                                   
10489 16868                                                   
10489 16896                                                   
10489 16926                                                   
10489 16949                                                   
10489 16988                                                   
10489 17002                                                   
10489 17028                                                   
10489 17059                                                   
10489 17078                                                   
10489 17088                                                   
10489 17088                                                   
run ;                                                         
                                                              
data want (keep = de_id) ;                                    
  length kx $ 4000 ;                                          
  do until (last.de_id) ;                                     
    set have ;                                                
    by de_id ;                                                
    substr (kx, intck ("qtr", "01jan1582"d, sdate), 1) = "1" ;
  end ;                                                       
  if find (kx, "1111") ;                                      
run ;

Note that it doesn't require the input file to be also sorted by SDATE within DE_ID. The duplicates, i.e. multiple service dates per quarter, are auto-eliminated by key-indexing "1" into the same byte of KX. This is quite advantageous because while large claim files are often sorted by member ID, they're much more rarely sorted also by service date. However, if the power of the hash object is added to the key-indexing, there's no need for the input file to be sorted at all, even by DE_ID:

data want (keep = de_id) ;                                  
  if _n_ = 1 then do ;                                      
    dcl hash h () ;                                         
    h.definekey ("de_id") ;                                 
    h.definedata ("de_id", "kx") ;                          
    h.definedone () ;                                       
    dcl hiter hi ("h") ;                                    
  end ;                                                     
  set have end = z ;                                        
  if h.find() ne 0 then kx = put ("", $4000.) ;             
  substr (kx, intck ("qtr", "01jan1582"d, sdate), 1) = "1" ;
  h.replace() ;                                             
  if z then do while (hi.next() = 0) ;                      
    if find (kx, "1111") then output ;                      
  end ;                                                     
run ;

Note that it's also possible to eschew the hash iterator and the DO loop in favor of the OUTPUT method, but in this case there's no way to drop the longish variable KX from the output since the WHERE clause relies on it:

data _null_ ;                                                       
  if _n_ = 1 then do ;                                              
    dcl hash h () ;                                                 
    h.definekey ("de_id") ;                                         
    h.definedata ("de_id", "kx") ;                                  
    h.definedone () ;                                               
  end ;                                                             
  set have end = z ;                                                
  if h.find() ne 0 then kx = put ("", $4000.) ;                     
  substr (kx, intck ("qtr", "01jan1582"d, sdate), 1) = "1" ;        
  h.replace() ;                                                     
  if z then h.output (dataset: "want (where = (find (kx, '1111')))") ;
run ;

Kind regards

Paul D.

Charisma · Posted 07-25-2019 02:48 PM

Thank you for your help, comments and feedback Hashman. I will bear that in mind for future postings

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away