Hi @novinosrin and @PaigeMiller First off, I really appreciate the time you are spending on this. It's really helping me learn a lot. Secondly, @novinosrin you have no need to apologize, you are both doing me a huge service helping me out on this, and anyone who would be impatient when someone is volunteering to help them probably doesn't deserve your help. Thank you. Since things have changed a bit after a day of working on this, I thought a general update and summary of where we are at would be helpful. 1. I've managed to convert the date variables in the dataset to SAS dates, so my dataset now looks something like this: data have;
input studyID date :anydtdte10. ed_visits year ;
cards;
1 2011-04 0 2011
1 2011-05 1 2011
1 2012-01 3 2012
1 2012-04 0 2012
1 2013-05 2 2013
1 2013-06 0 2013
2 2011-03 4 2011
3 2011-04 2 2011
3 2011-06 0 2011
3 2013-08 2 2013
4 2011-04 1 2011
4 2012-05 1 2012
4 2013-06 1 2013
4 2014-07 1 2014
5 2011-03 0 2011
5 2012-05 0 2012
5 2013-08 4 2013
5 2014-02 0 2013
run; Ideally, I would like to end up with a dataset that looks like this data want;
input studyID date :anydtdte10. ed_visits year repeat_user ;
cards;
1 2011-04 0 2011 1
1 2011-05 1 2011 1
1 2012-01 2 2012 1
1 2012-04 0 2012 1
1 2013-05 2 2013 1
1 2013-06 0 2013 1
2 2011-03 4 2011 1
3 2011-04 2 2011 0
3 2011-06 0 2011 0
3 2013-08 2 2013 0
4 2011-04 1 2011 0
4 2012-05 1 2012 0
4 2013-06 1 2013 0
4 2014-07 1 2014 0
5 2011-03 0 2011 1
5 2012-05 0 2012 1
5 2013-08 4 2013 1
5 2014-02 0 2013 1
run; where 1 indicates a repeat user. Our criteria for a repeat user is 3 or more ed visits within a 12 month period. So in this case: #1 is a repeat user because they had 3 visits between 2011-05 and 2012-01 (less than 12 months apart). The 2 visits in 2013-05 don't affect anything #2 is a repeat user, because even though we only have one month of data for them, they had 4 visits in the month #3 is not a repeat user because even though they had 4 visits, 2 were in 2011 and 2 were in 2013 (more than 12 months apart) #4 is not a repeat user because they only had 1 visit in each year, with more than 12 months between them. #5 is a repeat user for the same reason as #2, we just have more data because they were in the study longer. Thus far we've tried: Proc expand - looked promising but ran into trouble because of overlapping dates. for example, both #1 and #2 and #4 have records for 2011-04. proc throws up an error because of the duplicate values in the ID field (date) @novinosrin 's hash method - This one feel like we're close, but while it does a good job of identifying cases like #1, it misses cases like #2 and #5 - I've tried the updated code and get the same results as before. @novinosrin 's sql method - This one is also close, but has the opposite problem of the Hash method, it's capturing many records that have less than three ed visits within 12 months, I haven't spotted any particular pattern yet, but I'll let you know if I find anything. and that's where we are at. I'm going to keep playing with the code I have to see if I can't figure out where things are going wrong, but I hope this clarifies things. Any further insights would be very much appreciated. Thank you both so much Mike
... View more