Hello,
Can someone please help me with this?
I am trying to determine patient IDs that are continuously enrolled defined as having at least one claim in each quarter. The dataset year runs from july 2013 to june 2014. What I did/WANT was to divide the year into 4 quarters of three months each and keep patient with at least one claim in each quarter. But manual check of the output dataset revealed that not all patients with this criteria was retained.
Sample dataset below:
data WORK.SAMPLE2013_2014;
infile datalines dsd truncover;
input DE_ID:9. SDATE:5. YYYYMMDD:8. LN:$30.;
format DE_ID 9. SDATE 5. YYYYMMDD 8.;
label DE_ID="DE_ID" SDATE="SDATE" YYYYMMDD="YYYYMMDD" LN="LN";
datalines;
10478 16755 20130715 ACETAMINOPHN
10478 16820 20130918 ACETAMINOPHN
10478 16947 20140123 ACETAMINOPHN
10478 16997 20140314 ACETAMINOPHEN-C
10478 17010 20140327 ACETAMINOPHEN-C
10478 17015 20140401 ACETAMINOPHN
10479 16741 20130701 ACETAMINOPHEN
10479 16741 20130701 HYDRO
10479 16786 20130815 HYDRO
10479 16786 20130815 ACETAMINOPHEN
10479 16811 20130909 HYDRO
10479 16812 20130910 ACETAMINOPHEN
10479 16850 20131018 ACETAMINOPHEN
10479 16850 20131018 HYDRO
10480 16796 20130825 TRA
10480 16825 20130923 TRA
10480 16853 20131021 TRA
10480 16890 20131127 TRA
10480 16907 20131214 TRA
10480 16907 20131214 TRA
10480 16938 20140114 TRA
10480 16958 20140203 TRA
10480 16989 20140306 TRA
10480 17021 20140407 TRA
10480 17045 20140501 TRA
10480 17073 20140529 TRA
10480 17100 20140625 TRA
10485 17030 20140416 HYDRO
10488 16742 20130702 HYDRO
10489 16742 20130702 HYDRO
10489 16777 20130806 HYDRO
10489 16808 20130906 HYDRO
10489 16839 20131007 HYDRO
10489 16868 20131105 HYDRO
10489 16896 20131203 HYDRO
10489 16926 20140102 HYDRO
10489 16949 20140125 HYDRO
10489 16988 20140305 HYDRO
10489 17002 20140319 HYDRO
10489 17028 20140414 HYDRO
10489 17059 20140515 HYDRO
10489 17078 20140603 TRA
10489 17088 20140613 TRA
10489 17088 20140613 HYDRO
The code I tried
/*Determination of continuous enrollment*/
data Cont_enr1314; set AGEREST1314;qtr= floor(sdate/91.25); run;
proc transpose data=Cont_enr1314 out=elig_wide1314 prefix=qtr; by de_id; var qtr;run;
data elig_real1314; set elig_wide1314;
array qt(*) qtr1-qtr126;
do i=1 to 125;
if qt(i+1)-qt(i)>1 then delete;
end;
do i=1 to 126;
if qt(i)^=. then qlast=qt(i);
end;
range=qlast-qt(1);
if range<4 then delete;
run;
/*What was done here was that every patient had to be eligible for at least one month in every quarter over a period of 4 quarters*/
proc sql; create table e_r1314 as select unique elig_real1314.de_id from elig_real1314; quit;
Manual check of the data should have retained IDS 10478, 10480, AND 10489. But the output I got with this code only retained ID 10489.
Can someone please tell me what is wrong or a more efficient way of doing this.
Thank you.
Welcome to the SAS Communities!
Ensure to read source date strings always with a SAS Date informat so that they translate into a SAS Date value (count of days since 1/1/1960) and not just a string or a number. I've modified the input statement you've posted so that this happens.
Once you've got SAS Date values you can use SAS calendar functions (intck in below code).
Below a code version which determines the DE_ID's with 4 continues claim quarters.
data have;
infile datalines dsd truncover dlm=' ' dsd;
input DE_ID:$9. SDATE:best32. YYYYMMDD:yymmdd8. LN:$30.;
format SDATE date9. YYYYMMDD yymmddn8.;
datalines;
10478 16755 20130715 ACETAMINOPHN
10478 16820 20130918 ACETAMINOPHN
10478 16947 20140123 ACETAMINOPHN
10478 16997 20140314 ACETAMINOPHEN-C
10478 17010 20140327 ACETAMINOPHEN-C
10478 17015 20140401 ACETAMINOPHN
10479 16741 20130701 ACETAMINOPHEN
10479 16741 20130701 HYDRO
10479 16786 20130815 HYDRO
10479 16786 20130815 ACETAMINOPHEN
10479 16811 20130909 HYDRO
10479 16812 20130910 ACETAMINOPHEN
10479 16850 20131018 ACETAMINOPHEN
10479 16850 20131018 HYDRO
10480 16796 20130825 TRA
10480 16825 20130923 TRA
10480 16853 20131021 TRA
10480 16890 20131127 TRA
10480 16907 20131214 TRA
10480 16907 20131214 TRA
10480 16938 20140114 TRA
10480 16958 20140203 TRA
10480 16989 20140306 TRA
10480 17021 20140407 TRA
10480 17045 20140501 TRA
10480 17073 20140529 TRA
10480 17100 20140625 TRA
10485 17030 20140416 HYDRO
10488 16742 20130702 HYDRO
10489 16742 20130702 HYDRO
10489 16777 20130806 HYDRO
10489 16808 20130906 HYDRO
10489 16839 20131007 HYDRO
10489 16868 20131105 HYDRO
10489 16896 20131203 HYDRO
10489 16926 20140102 HYDRO
10489 16949 20140125 HYDRO
10489 16988 20140305 HYDRO
10489 17002 20140319 HYDRO
10489 17028 20140414 HYDRO
10489 17059 20140515 HYDRO
10489 17078 20140603 TRA
10489 17088 20140613 TRA
10489 17088 20140613 HYDRO
;
run;
data want(keep=de_id);
set have;
by de_id sdate;
/* _prev_sdate: sdate value from previous row */
/* - lag function may not be within an IF condition */
_prev_sdate=lag(sdate);
/* initialize vars for a new de_id */
if first.de_id then
do;
_prev_sdate=.;
_n_cont_quarters=1;
end;
/* do not process rows if already a match found within a de_id */
if _n_cont_quarters=4 then delete;
/* if more than one quarter between dates then set _n_cont_quarters to 1, else add 1 */
retain _n_cont_quarters;
select(intck('quarter',_prev_sdate,sdate));
when(0) /* do nothing */;
when(1) _n_cont_quarters=sum(_n_cont_quarters,1);
otherwise _n_cont_quarters=1;
end;
/* output first row with 4 continues quarters */
if _n_cont_quarters=4 then output;
run;
proc print;
run;
Welcome to the SAS Communities!
Ensure to read source date strings always with a SAS Date informat so that they translate into a SAS Date value (count of days since 1/1/1960) and not just a string or a number. I've modified the input statement you've posted so that this happens.
Once you've got SAS Date values you can use SAS calendar functions (intck in below code).
Below a code version which determines the DE_ID's with 4 continues claim quarters.
data have;
infile datalines dsd truncover dlm=' ' dsd;
input DE_ID:$9. SDATE:best32. YYYYMMDD:yymmdd8. LN:$30.;
format SDATE date9. YYYYMMDD yymmddn8.;
datalines;
10478 16755 20130715 ACETAMINOPHN
10478 16820 20130918 ACETAMINOPHN
10478 16947 20140123 ACETAMINOPHN
10478 16997 20140314 ACETAMINOPHEN-C
10478 17010 20140327 ACETAMINOPHEN-C
10478 17015 20140401 ACETAMINOPHN
10479 16741 20130701 ACETAMINOPHEN
10479 16741 20130701 HYDRO
10479 16786 20130815 HYDRO
10479 16786 20130815 ACETAMINOPHEN
10479 16811 20130909 HYDRO
10479 16812 20130910 ACETAMINOPHEN
10479 16850 20131018 ACETAMINOPHEN
10479 16850 20131018 HYDRO
10480 16796 20130825 TRA
10480 16825 20130923 TRA
10480 16853 20131021 TRA
10480 16890 20131127 TRA
10480 16907 20131214 TRA
10480 16907 20131214 TRA
10480 16938 20140114 TRA
10480 16958 20140203 TRA
10480 16989 20140306 TRA
10480 17021 20140407 TRA
10480 17045 20140501 TRA
10480 17073 20140529 TRA
10480 17100 20140625 TRA
10485 17030 20140416 HYDRO
10488 16742 20130702 HYDRO
10489 16742 20130702 HYDRO
10489 16777 20130806 HYDRO
10489 16808 20130906 HYDRO
10489 16839 20131007 HYDRO
10489 16868 20131105 HYDRO
10489 16896 20131203 HYDRO
10489 16926 20140102 HYDRO
10489 16949 20140125 HYDRO
10489 16988 20140305 HYDRO
10489 17002 20140319 HYDRO
10489 17028 20140414 HYDRO
10489 17059 20140515 HYDRO
10489 17078 20140603 TRA
10489 17088 20140613 TRA
10489 17088 20140613 HYDRO
;
run;
data want(keep=de_id);
set have;
by de_id sdate;
/* _prev_sdate: sdate value from previous row */
/* - lag function may not be within an IF condition */
_prev_sdate=lag(sdate);
/* initialize vars for a new de_id */
if first.de_id then
do;
_prev_sdate=.;
_n_cont_quarters=1;
end;
/* do not process rows if already a match found within a de_id */
if _n_cont_quarters=4 then delete;
/* if more than one quarter between dates then set _n_cont_quarters to 1, else add 1 */
retain _n_cont_quarters;
select(intck('quarter',_prev_sdate,sdate));
when(0) /* do nothing */;
when(1) _n_cont_quarters=sum(_n_cont_quarters,1);
otherwise _n_cont_quarters=1;
end;
/* output first row with 4 continues quarters */
if _n_cont_quarters=4 then output;
run;
proc print;
run;
Thank you so much Patrick!!!!!!! It worked.
Quick question: when I used the format statement code below:
data want; format sdate date9. YYYYMMDD yymmdd8.; set have;run;
I got an error message:
There was a problem with the Format so BEST. was used.
Why is that?
Thank you for your time:)
@Charisma wrote:
Thank you so much Patrick!!!!!!! It worked.
Quick question: when I used the format statement code below:
data want; format sdate date9. YYYYMMDD yymmdd8.; set have;run;
I got an error message:
There was a problem with the Format so BEST. was used.
Why is that?
Thank you for your time:)
I'm not sure what you're referring to. There isn't such an Error/Warning with the code I've posted and I didn't find a way to replicate the issue you're raising.
28 data have; 29 infile datalines dsd truncover dlm=' ' dsd; 30 input DE_ID:$9. SDATE:best32. YYYYMMDD:yymmdd8. LN:$30.; 31 format SDATE date9. YYYYMMDD yymmddn8.; 32 datalines; NOTE: The data set WORK.HAVE has 44 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds
Frankly, from your description of the problem and your sample data set I could hardly understand the nature of the task. Luckily, @Patrick, obviously much more perceptive than me, apparently understood it correctly, judging from the fact that you've picked his solution as correct. Since it follows from his analysis that you don't need any variables except DE_ID and SDATE to attain the goal, the extraneous variables YYYYMMD and LN in your input do nothing but muddy the waters. If you would like to be helped with SAS programming, please do a favor to those willing to assist and don't make their efforts more strenuous by adding the job of understanding your problem to the job of helping you solve it.
As to the task itself (again, I've managed to understand its gist only thanks to @Patrick and his program), it is one of those that easily succumb to the key-indexing approach I privately call "the paint brush method". In fact, in this case it's even more so because with regard to continuous enrollment we normally have to deal with days (and so the key-indexed table have to accommodate the range of all possible dates), while in your case it's much easier because we only have to accommodate a range of quarters. There are only about 4000 quarters between the earliest valid SAS date, 01jan1582, and year 01jan2582, which is why 4000 cells in the key-indexed table (whether they are array items, bitmap bits, or character string bytes) is enough to cover all bases. Below, the simplest form of a key-indexed table is chosen as a character variable KX $4000.
If your input file is sorted (or grouped) by DE_ID, as in your data sample, the "paint brush" solution is as simple as:
data have ;
input DE_ID :$9. sdate ;
cards ;
10478 16755
10478 16820
10478 16947
10478 16997
10478 17010
10478 17015
10479 16741
10479 16741
10479 16786
10479 16786
10479 16811
10479 16812
10479 16850
10479 16850
10480 16796
10480 16825
10480 16853
10480 16890
10480 16907
10480 16907
10480 16938
10480 16958
10480 16989
10480 17021
10480 17045
10480 17073
10480 17100
10485 17030
10488 16742
10489 16742
10489 16777
10489 16808
10489 16839
10489 16868
10489 16896
10489 16926
10489 16949
10489 16988
10489 17002
10489 17028
10489 17059
10489 17078
10489 17088
10489 17088
run ;
data want (keep = de_id) ;
length kx $ 4000 ;
do until (last.de_id) ;
set have ;
by de_id ;
substr (kx, intck ("qtr", "01jan1582"d, sdate), 1) = "1" ;
end ;
if find (kx, "1111") ;
run ;
Note that it doesn't require the input file to be also sorted by SDATE within DE_ID. The duplicates, i.e. multiple service dates per quarter, are auto-eliminated by key-indexing "1" into the same byte of KX. This is quite advantageous because while large claim files are often sorted by member ID, they're much more rarely sorted also by service date. However, if the power of the hash object is added to the key-indexing, there's no need for the input file to be sorted at all, even by DE_ID:
data want (keep = de_id) ;
if _n_ = 1 then do ;
dcl hash h () ;
h.definekey ("de_id") ;
h.definedata ("de_id", "kx") ;
h.definedone () ;
dcl hiter hi ("h") ;
end ;
set have end = z ;
if h.find() ne 0 then kx = put ("", $4000.) ;
substr (kx, intck ("qtr", "01jan1582"d, sdate), 1) = "1" ;
h.replace() ;
if z then do while (hi.next() = 0) ;
if find (kx, "1111") then output ;
end ;
run ;
Note that it's also possible to eschew the hash iterator and the DO loop in favor of the OUTPUT method, but in this case there's no way to drop the longish variable KX from the output since the WHERE clause relies on it:
data _null_ ;
if _n_ = 1 then do ;
dcl hash h () ;
h.definekey ("de_id") ;
h.definedata ("de_id", "kx") ;
h.definedone () ;
end ;
set have end = z ;
if h.find() ne 0 then kx = put ("", $4000.) ;
substr (kx, intck ("qtr", "01jan1582"d, sdate), 1) = "1" ;
h.replace() ;
if z then h.output (dataset: "want (where = (find (kx, '1111')))") ;
run ;
Kind regards
Paul D.
Thank you for your help, comments and feedback Hashman. I will bear that in mind for future postings
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.