Hi All,
I have a data that each patient has multiple records.
how can I do: if the variable time has more than four records (include four) within any continuous 12 month for the patient, then the patient defined as "Serious"?
Thanks,
data abc; informat date mmddyy10.; informat time anydtdtm16.; input ID Date time; format date mmddyy10. time MDYAMPM.; datalines; 14022 6/22/2010 10/11/2004 9:02 14022 6/22/2010 3/25/2008 12:53 14022 6/22/2010 4/29/2008 9:04 14022 6/22/2010 6/17/2008 9:14 14022 6/22/2010 7/8/2008 10:41 14022 6/22/2010 8/12/2008 12:30 14022 6/22/2010 9/16/2008 14:24 14022 6/22/2010 10/21/2008 1:44 14022 6/22/2010 11/18/2008 2:11 14022 6/22/2010 12/30/2008 9:53 14022 6/22/2010 2/10/2009 13:07 14022 6/22/2010 4/7/2009 9:45 14022 6/22/2010 6/2/2009 7:35 14022 6/22/2010 7/28/2009 10:11 14022 6/22/2010 9/29/2009 10:46 14022 6/22/2010 12/1/2009 10:51 14022 6/22/2010 1/26/2010 8:49 14022 6/22/2010 5/23/2010 7:48 14022 6/22/2010 6/22/2010 7:46 14032 2/7/2012 7/24/2002 10:25 14032 2/7/2012 1/21/2003 12:00 14032 2/7/2012 1/28/2003 9:25 14032 2/7/2012 1/29/2008 9:00 14032 2/7/2012 3/20/2008 13:52 14032 2/7/2012 5/1/2008 8:53 14032 2/7/2012 6/12/2008 14:01 14032 2/7/2012 7/24/2008 15:00 14032 2/7/2012 9/11/2008 11:02 14032 2/7/2012 9/20/2011 9:25 14032 2/7/2012 11/1/2011 10:40 14032 2/7/2012 1/8/2012 12:58 14032 2/7/2012 2/7/2012 10:20 14032 2/7/2012 3/7/2012 9:18 14032 2/7/2012 5/15/2012 12:43 14032 2/7/2012 6/19/2012 9:19 14032 2/7/2012 7/31/2012 11:48 14032 2/7/2012 9/18/2012 12:15 14032 2/7/2012 10/16/2012 12:35 14032 2/7/2012 10/18/2012 11:12 14032 2/7/2012 10/19/2012 5:00 14032 2/7/2012 10/20/2012 4:33 14032 2/7/2012 10/21/2012 4:19 14032 2/7/2012 10/22/2012 4:30 14032 2/7/2012 10/23/2012 5:03 14032 2/7/2012 10/24/2012 3:14 14032 2/7/2012 10/30/2012 11:10 14032 2/7/2012 11/27/2012 12:30 14032 2/7/2012 1/8/2013 9:58 14032 2/7/2012 2/19/2013 10:07 14032 2/7/2012 4/2/2013 13:45 14032 2/7/2012 5/14/2013 10:18 14601 1/30/2014 4/26/2005 8:40 14601 1/30/2014 6/28/2005 9:05 14601 1/30/2014 7/26/2005 8:21 14601 1/30/2014 8/9/2005 8:24 14601 1/30/2014 8/23/2005 8:10 14601 1/30/2014 10/18/2005 9:20 14601 1/30/2014 12/29/2013 9:32 14601 1/30/2014 12/30/2013 9:04 14601 1/30/2014 2/16/2014 8:44 14601 1/30/2014 5/2/2006 10:31 14601 1/30/2014 7/12/2006 7:31 14601 1/30/2014 9/7/2006 8:55 14601 1/30/2014 10/31/2006 8:43 14601 1/30/2014 2/6/2007 9:15 14601 1/30/2014 5/3/2007 7:54 14601 1/30/2014 6/28/2007 7:52 14601 1/30/2014 8/23/2007 8:10 14601 1/30/2014 10/18/2007 8:17 14601 1/30/2014 11/19/2007 2:07 14601 1/30/2014 12/13/2007 8:16 14601 1/30/2014 1/24/2008 10:23 14601 1/30/2014 1/31/2008 10:07 14601 1/30/2014 2/12/2008 12:31 14601 1/30/2014 3/25/2008 9:52 ; run;
may be something like this.
proc sql; select a.*, case when a.id =b.id and b.cnt ge 4 then 'serious' else 'ok' end as condition from (select * from abc)a left join (select distinct id, year(date) as year, count(*) as cnt from abc group by id, year(date))b on a.id =b.id;
Seems to work if you only look at calendar year ... but as I understood the problem, the rolling 12 month period could start at any date, for example if there were 4 visit during the period 3/27/14 to 3/26/15, that would also count as SERIOUS.
I get the feeling that PROC EXPAND could provide a solution, for example if there was one observation for each patient at each date of the time period, and a value of 1 if the patient had a visit on that date, and 0 otherwise. Then PROC EXPAND could compute rolling sums over the last 365 days of the 0-1 variable for each starting date and any time this rolling sum was 4 or greater, you get a SERIOUS flag. I guess this fails for leap years, but otherwise that seems to work in my mind ... of course getting it to work with actual code seems a little harder, but doable, and I don't have the time to do it right now.
Yes, you are right!
Even leap years should be include in my result. As your example, if there were 4 visit during the period 3/27/14 to 3/26/15, that would also count as SERIOUS.
Really no idea how to do it.....
Here is a way to do it using a FIFO stack:
data want (drop=stack:); retain stack0-stack3; array stack(0:3) stack0-stack3; length serious $3; do until(last.id); set abc; by id; if first.id then do; serious='No'; n=1; call missing(of stack(*)); end; else n+1; stack(mod(n,4))=time; if n ge 4 and serious='No' then do;
if yrdif(datepart(min(of stack(*))),datepart(max(of stack(*))),'age') le 1 then
serious='Yes'; end; end; do until(last.id); set abc; by id; output; end; run;
Art, CEO, AnalystFinder.com
Although I don't understand @art297's code, and I have never heard of a FIFO stack in SAS, I do know what FIFO means and yes that should work if it is implemented properly (and its hard to imagine a situation where art297's code is wrong).
With regards to PROC EXPAND, accounting for leap years would be done by using one moving sum for 365 days and one for 366 days; when the starting date plus one year includes leap day, you would use the 366 day moving sum, otherwise you would use the 365 day moving sum.
@PaigeMiller: Actually, the code I originally posted (but have now changed) was wrong.
I stole the FIFO logic from @ArtC! If you have an array (in this case called STACK), ranging from 0 to N-1, and fill the mod(n,N) cell with the current value, you will always be replacing the FI (first in) cell.
Therefore, if the file is in date order, yrdif(datepart(min(of stack(*))),datepart(max(of stack(*))),'age') le 1 will only be le 1 if the four entries capture a period that is under one year.
Art, CEO, AnalystFinder.com
Always good to learn new techniques and new ways of thinking. Back in the days before I had SAS/ETS (and therefore I didn't have PROC EXPAND), I probably would have had to create an immensely complicated data step to do this calculation, and even with PROC EXPAND, it would require multiple steps before PROC EXPAND could be used. So I am bookmarking this and I hope I get to use it someday.
Hi Art,
Thanks so much for your new method. is it proper for "the patient has more than four records (includes four) in a period of 12 month?" I wrote the patient has four records in a period of 12 month, which is wrong (I've changed it now). Sorry for my mistake.
Thanks,
L
Hi Art,
I tried to run your solution, but the output seems not correct. for example, in my real data, id = 14238 should not be marked as Serious since there is no more 4 records for the patient in any continuous 12 month. Any idea about this?
Yes | 14238 | 5/21/2007 |
Yes | 14238 | 6/14/2007 |
Yes | 14238 | 7/12/2007 |
Yes | 14238 | 7/15/2009 |
Yes | 14238 | 7/16/2009 |
What do you mean by " if the variable time has four records within 12 month for the patient" ? data abc; informat date mmddyy10.; informat time anydtdtm16.; input ID Date time; format date mmddyy10. time MDYAMPM.; datalines; 14022 6/22/2010 10/11/2004 9:02 14022 6/22/2010 3/25/2008 12:53 14022 6/22/2010 4/29/2008 9:04 14022 6/22/2010 6/17/2008 9:14 14022 6/22/2010 7/8/2008 10:41 14022 6/22/2010 8/12/2008 12:30 14022 6/22/2010 9/16/2008 14:24 14022 6/22/2010 10/21/2008 1:44 14022 6/22/2010 11/18/2008 2:11 14022 6/22/2010 12/30/2008 9:53 14022 6/22/2010 2/10/2009 13:07 14022 6/22/2010 4/7/2009 9:45 14022 6/22/2010 6/2/2009 7:35 14022 6/22/2010 7/28/2009 10:11 14022 6/22/2010 9/29/2009 10:46 14022 6/22/2010 12/1/2009 10:51 14022 6/22/2010 1/26/2010 8:49 14022 6/22/2010 5/23/2010 7:48 14022 6/22/2010 6/22/2010 7:46 14032 2/7/2012 7/24/2002 10:25 14032 2/7/2012 1/21/2003 12:00 14032 2/7/2012 1/28/2003 9:25 14032 2/7/2012 1/29/2008 9:00 14032 2/7/2012 3/20/2008 13:52 14032 2/7/2012 5/1/2008 8:53 14032 2/7/2012 6/12/2008 14:01 14032 2/7/2012 7/24/2008 15:00 14032 2/7/2012 9/11/2008 11:02 14032 2/7/2012 9/20/2011 9:25 14032 2/7/2012 11/1/2011 10:40 14032 2/7/2012 1/8/2012 12:58 14032 2/7/2012 2/7/2012 10:20 14032 2/7/2012 3/7/2012 9:18 14032 2/7/2012 5/15/2012 12:43 14032 2/7/2012 6/19/2012 9:19 14032 2/7/2012 7/31/2012 11:48 14032 2/7/2012 9/18/2012 12:15 14032 2/7/2012 10/16/2012 12:35 14032 2/7/2012 10/18/2012 11:12 14032 2/7/2012 10/19/2012 5:00 14032 2/7/2012 10/20/2012 4:33 14032 2/7/2012 10/21/2012 4:19 14032 2/7/2012 10/22/2012 4:30 14032 2/7/2012 10/23/2012 5:03 14032 2/7/2012 10/24/2012 3:14 14032 2/7/2012 10/30/2012 11:10 14032 2/7/2012 11/27/2012 12:30 14032 2/7/2012 1/8/2013 9:58 14032 2/7/2012 2/19/2013 10:07 14032 2/7/2012 4/2/2013 13:45 14032 2/7/2012 5/14/2013 10:18 14601 1/30/2014 4/26/2005 8:40 14601 1/30/2014 6/28/2005 9:05 14601 1/30/2014 7/26/2005 8:21 14601 1/30/2014 8/9/2005 8:24 14601 1/30/2014 8/23/2005 8:10 14601 1/30/2014 10/18/2005 9:20 14601 1/30/2014 12/29/2013 9:32 14601 1/30/2014 12/30/2013 9:04 14601 1/30/2014 2/16/2014 8:44 14601 1/30/2014 5/2/2006 10:31 14601 1/30/2014 7/12/2006 7:31 14601 1/30/2014 9/7/2006 8:55 14601 1/30/2014 10/31/2006 8:43 14601 1/30/2014 2/6/2007 9:15 14601 1/30/2014 5/3/2007 7:54 14601 1/30/2014 6/28/2007 7:52 14601 1/30/2014 8/23/2007 8:10 14601 1/30/2014 10/18/2007 8:17 14601 1/30/2014 11/19/2007 2:07 14601 1/30/2014 12/13/2007 8:16 14601 1/30/2014 1/24/2008 10:23 14601 1/30/2014 1/31/2008 10:07 14601 1/30/2014 2/12/2008 12:31 14601 1/30/2014 3/25/2008 9:52 ; run; proc sql; select *,case when (select count(*) from abc where id=a.id and time between a.time and intnx('dtmonth',a.time,12,'s')) > 3 then 1 else 0 end as flag from abc as a; quit;
it means in the whole treatment for the patient (time from first record to last record for the patient), if the patient has more than four records in any continuous 12 month, the patient will be marked as serious.
Even leap years should be include in my result. for example, if there were 4 visit during the period 3/27/14 to 3/26/15, that would also count as SERIOUS.
Thanks,
As shown below, after adding the new records you mentioned, that new case results in NOT being identified as being SERIOUS, so I'm not sure what you ran.
In the code below, I made the changes that would be needed to accomodate your changed requirement (i.e., more than 4 months rather than four months or more):
data abc; informat date mmddyy10.; informat time anydtdtm16.; input ID Date time; format date mmddyy10. time MDYAMPM.; datalines; 14022 6/22/2010 10/11/2004 9:02 14022 6/22/2010 3/25/2008 12:53 14022 6/22/2010 4/29/2008 9:04 14022 6/22/2010 6/17/2008 9:14 14022 6/22/2010 7/8/2008 10:41 14022 6/22/2010 8/12/2008 12:30 14022 6/22/2010 9/16/2008 14:24 14022 6/22/2010 10/21/2008 1:44 14022 6/22/2010 11/18/2008 2:11 14022 6/22/2010 12/30/2008 9:53 14022 6/22/2010 2/10/2009 13:07 14022 6/22/2010 4/7/2009 9:45 14022 6/22/2010 6/2/2009 7:35 14022 6/22/2010 7/28/2009 10:11 14022 6/22/2010 9/29/2009 10:46 14022 6/22/2010 12/1/2009 10:51 14022 6/22/2010 1/26/2010 8:49 14022 6/22/2010 5/23/2010 7:48 14022 6/22/2010 6/22/2010 7:46 14032 2/7/2012 7/24/2002 10:25 14032 2/7/2012 1/21/2003 12:00 14032 2/7/2012 1/28/2003 9:25 14032 2/7/2012 1/29/2008 9:00 14032 2/7/2012 3/20/2008 13:52 14032 2/7/2012 5/1/2008 8:53 14032 2/7/2012 6/12/2008 14:01 14032 2/7/2012 7/24/2008 15:00 14032 2/7/2012 9/11/2008 11:02 14032 2/7/2012 9/20/2011 9:25 14032 2/7/2012 11/1/2011 10:40 14032 2/7/2012 1/8/2012 12:58 14032 2/7/2012 2/7/2012 10:20 14032 2/7/2012 3/7/2012 9:18 14032 2/7/2012 5/15/2012 12:43 14032 2/7/2012 6/19/2012 9:19 14032 2/7/2012 7/31/2012 11:48 14032 2/7/2012 9/18/2012 12:15 14032 2/7/2012 10/16/2012 12:35 14032 2/7/2012 10/18/2012 11:12 14032 2/7/2012 10/19/2012 5:00 14032 2/7/2012 10/20/2012 4:33 14032 2/7/2012 10/21/2012 4:19 14032 2/7/2012 10/22/2012 4:30 14032 2/7/2012 10/23/2012 5:03 14032 2/7/2012 10/24/2012 3:14 14032 2/7/2012 10/30/2012 11:10 14032 2/7/2012 11/27/2012 12:30 14032 2/7/2012 1/8/2013 9:58 14032 2/7/2012 2/19/2013 10:07 14032 2/7/2012 4/2/2013 13:45 14032 2/7/2012 5/14/2013 10:18 14238 2/7/2012 5/21/2007 10:18 14238 2/7/2012 6/14/2007 10:18 14238 2/7/2012 7/12/2007 10:18 14238 2/7/2012 7/15/2009 10:18 14238 2/7/2012 7/16/2009 10:18 14601 1/30/2014 4/26/2005 8:40 14601 1/30/2014 6/28/2005 9:05 14601 1/30/2014 7/26/2005 8:21 14601 1/30/2014 8/9/2005 8:24 14601 1/30/2014 8/23/2005 8:10 14601 1/30/2014 10/18/2005 9:20 14601 1/30/2014 12/29/2013 9:32 14601 1/30/2014 12/30/2013 9:04 14601 1/30/2014 2/16/2014 8:44 14601 1/30/2014 5/2/2006 10:31 14601 1/30/2014 7/12/2006 7:31 14601 1/30/2014 9/7/2006 8:55 14601 1/30/2014 10/31/2006 8:43 14601 1/30/2014 2/6/2007 9:15 14601 1/30/2014 5/3/2007 7:54 14601 1/30/2014 6/28/2007 7:52 14601 1/30/2014 8/23/2007 8:10 14601 1/30/2014 10/18/2007 8:17 14601 1/30/2014 11/19/2007 2:07 14601 1/30/2014 12/13/2007 8:16 14601 1/30/2014 1/24/2008 10:23 14601 1/30/2014 1/31/2008 10:07 14601 1/30/2014 2/12/2008 12:31 14601 1/30/2014 3/25/2008 9:52 ; run; data want (drop=stack:); retain stack0-stack4; array stack(0:4) stack0-stack4; length serious $3; do until(last.id); set abc; by id; if first.id then do; serious='No'; n=1; call missing(of stack(*)); end; else n+1; stack(mod(n,5))=time; if n ge 5 and serious='No' then do; if yrdif(datepart(min(of stack(*))),datepart(max(of stack(*))),'age') le 1 then serious='Yes'; end; end; do until(last.id); set abc; by id; output; end; run;
Art, CEO, AnalystFinder.com
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.