BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
novinosrin
Tourmaline | Level 20

Hi @righcoastmike   Thank your for your patience and feedback. Can you post a more comprehensive sample of your HAVE and the WANT. I have a feeling, I am failing to understand the requirement somewhere.

 

Objective : To find study id's have 3 or more ed_visits within 12 months ( we'll call them repeat users)

My understanding: If interval between "any 2 dates is <=12", that qualifies. So if month interval between first record of studyid and 3rd record or nth record of studyid  is <=12, that still qualifies.

 

So once we have the dates extracted

 

 

 

 

 

md=input(compress(mth_date,'-'),yymmn6.);

 

 

Here is how the HASH logic works:

1. Dataset temp with extracted date is sorted by studyid for By group processing

2. For every observation of By group, the look up happens for all obs for it's equivalent. So if studyid has 3 obs as in the sample, the look up happens 3*3=9 times i.e 1 obs of studyid1 with all 3 obs of studyid 1, then  2nd obs of studyid1 with all 3 obs of studyid 1 etc.

3. Since it is a look ahead(next set of records i.e future dates) the look up(hash table) md variable should be greater than the current md variable of  the table from which the look up is taking place.

4. Should 3 be true, we check whether number of months between the two dates is <=12. If 3 and 4 happen to be true, we sum ed_visit of current +ed_visits of the future(look up table)

4. The sum is a running total across each by group 

5. At the end of each by group should the cumulative sum is >=3 then that entire group should be flagged as 1 else 0

6. In instances, where there is only one record for a by group, should the value of ed_visits is >=3 that should also be flagged as 1.

 

If the above is correct, I think where i perhaps was an idiot is likely this part that I am noticing now, that misses it is

 

The logic requires change to-->

do rc=h.find() by 0 while(rc=0);
if md>_md and intck('month',_md,md)<=12 then do;
k=sum(ed_visits,_ed_visits);
_k+k;
end;
rc=h.find_next();
end;

from the earlier below

do rc=h.find() by 0 while(rc=0);
if md>_md and intck('month',_md,md)<=12 then k=sum(ed_visits,_ed_visits);
rc=h.find_next();
end;
_k+k;

 

So the changed full version is

 


data want;
if _n_=1 then do;
	if 0 then set have;
   dcl hash H (dataset:'have',ordered: "A",multidata:'y') ;
   h.definekey  ("studyID") ;
   h.definedata ("md","ed_visits") ;
   h.definedone () ;
end;
_k=0;
do until(last.studyid);
set have(rename=(md=_md ed_visits=_ed_visits));
by studyID;
do rc=h.find() by 0 while(rc=0);
if md>_md and intck('month',_md,md)<=12 then do;
k=sum(ed_visits,_ed_visits);
_k+k;
end;
rc=h.find_next();
end;
end;
repeat_user=first.studyid and last.studyid and _ed_visits>=3  or k>=3 ;
do until(last.studyid);
set have;
by studyID;
output;
end;
keep studyid mth_date ed_visits year repeat_user;
run;

 

Let me know. Sorry and Thank you for your patience. 

 

 

righcoastmike
Quartz | Level 8

Hi @novinosrin  and @PaigeMiller 

 

First off, I really appreciate the time you are spending on this. It's really helping me learn a lot. 

 

Secondly, @novinosrin you have no need to apologize, you are  both doing me a huge service helping me out on this, and anyone who would be impatient when someone is volunteering to help them probably doesn't deserve your help. Thank you. 

 

Since things have changed a bit after a day of working on this, I thought a general update and summary of where we are at would be helpful. 

 

1. I've managed to convert the date variables in the dataset to SAS dates, so my dataset now looks something like this: 

 

data have;
input studyID date :anydtdte10. ed_visits year ;
cards;
1 2011-04 0 2011
1 2011-05 1 2011
1 2012-01 3 2012
1 2012-04 0 2012
1 2013-05 2 2013
1 2013-06 0 2013
2 2011-03 4 2011
3 2011-04 2 2011
3 2011-06 0 2011
3 2013-08 2 2013
4 2011-04 1 2011
4 2012-05 1 2012
4 2013-06 1 2013
4 2014-07 1 2014
5 2011-03 0 2011
5 2012-05 0 2012
5 2013-08 4 2013
5 2014-02 0 2013
run;

Ideally, I would like to end up with a dataset that looks like this

 

data want;
input studyID date :anydtdte10. ed_visits year repeat_user ;
cards;
1 2011-04 0 2011 1
1 2011-05 1 2011 1
1 2012-01 2 2012 1
1 2012-04 0 2012 1 
1 2013-05 2 2013 1
1 2013-06 0 2013 1
2 2011-03 4 2011 1
3 2011-04 2 2011 0
3 2011-06 0 2011 0
3 2013-08 2 2013 0
4 2011-04 1 2011 0
4 2012-05 1 2012 0
4 2013-06 1 2013 0
4 2014-07 1 2014 0
5 2011-03 0 2011 1
5 2012-05 0 2012 1
5 2013-08 4 2013 1
5 2014-02 0 2013 1
run;

where 1 indicates a repeat user. 

 

Our criteria for a repeat user is 3 or more ed visits within a 12 month period. 

 

So in this case: 

#1 is a repeat user because they had 3 visits between 2011-05 and 2012-01 (less than 12 months apart). The 2 visits in 2013-05 don't affect anything

 

#2 is a repeat user, because even though we only have one month of data for them, they had 4 visits in the month

 

#3 is not a repeat user because even though they had 4 visits, 2 were in 2011 and 2 were in 2013 (more than 12 months apart)

 

#4 is not a repeat user because they only had 1 visit in each year, with more than 12 months between them. 

 

#5 is a repeat user for the same reason as #2, we just have more data because they were in the study longer. 

 

Thus far we've tried: 

 

Proc expand - looked promising but ran into trouble because of overlapping dates. for example, both #1 and #2 and #4 have records for 2011-04. proc throws up an error because of the duplicate values in the ID field (date) 

 

@novinosrin 's hash method - This one feel like we're close, but while it does a good job of identifying cases like  #1, it misses cases like #2 and #5 - I've tried the updated code and get the same results as before.

 

@novinosrin 's sql method - This one is also close, but has the opposite problem of the Hash method, it's capturing many records that have less than three ed visits within 12 months, I haven't spotted any particular pattern yet, but I'll let you know if I find anything. 

 

and that's where we are at. I'm going to keep playing with the code I have to see if I can't figure out where things are going wrong, but I hope this clarifies things. Any further insights would be very much appreciated. 

 

Thank you both so much 

 

Mike 

 

 

 

 

 

novinosrin
Tourmaline | Level 20

Hi and Good evening @righcoastmike   A  silly change that i missed to make

 

The k>3 below

repeat_user=first.studyid and last.studyid and _ed_visits>=3  or k>=3 ;

 should have been  _k>3 with a prefix _ underscore coz that's the accumlating var

 

So I tested with the changed below

repeat_user=first.studyid and last.studyid and _ed_visits>=3  or _k>=3 ;

 Here -->

 

/*Again I converted to SAS dates, this is no big deal*/
data have;
input studyID date $ ed_visits year ;
md=input(compress(date,'-'),yymmn6.); /*notice here the converted SAS date variable*/
format md date9.;
cards;
1 2011-04 0 2011
1 2011-05 1 2011
1 2012-01 3 2012
1 2012-04 0 2012
1 2013-05 2 2013
1 2013-06 0 2013
2 2011-03 4 2011
3 2011-04 2 2011
3 2011-06 0 2011
3 2013-08 2 2013
4 2011-04 1 2011
4 2012-05 1 2012
4 2013-06 1 2013
4 2014-07 1 2014
5 2011-03 0 2011
5 2012-05 0 2012
5 2013-08 4 2013
5 2014-02 0 2013
run;


/*the same code with _k>3 lol hahaha*/
data want;
if _n_=1 then do;
	if 0 then set have;
   dcl hash H (dataset:'have',ordered: "A",multidata:'y') ;
   h.definekey  ("studyID") ;
   h.definedata ("md","ed_visits") ;
   h.definedone () ;
end;
_k=0;
do until(last.studyid);
set have(rename=(md=_md ed_visits=_ed_visits));
by studyID;
do rc=h.find() by 0 while(rc=0);
if md>_md and intck('month',_md,md)<=12 then do;
k=sum(ed_visits,_ed_visits);
_k+k;
end;
rc=h.find_next();
end;
end;
repeat_user=first.studyid and last.studyid and _ed_visits>=3  or _k>=3 ;
do until(last.studyid);
set have;
by studyID;
output;
end;
keep studyid date md ed_visits year repeat_user;
run;

Proc print noobs;run;

 

Here are the test results;


study repeat_
ID date ed_visits year md user

1 2011-04 0 2011 01APR2011 1
1 2011-05 1 2011 01MAY2011 1
1 2012-01 3 2012 01JAN2012 1
1 2012-04 0 2012 01APR2012 1
1 2013-05 2 2013 01MAY2013 1
1 2013-06 0 2013 01JUN2013 1
2 2011-03 4 2011 01MAR2011 1
3 2011-04 2 2011 01APR2011 0
3 2011-06 0 2011 01JUN2011 0
3 2013-08 2 2013 01AUG2013 0
4 2011-04 1 2011 01APR2011 0
4 2012-05 1 2012 01MAY2012 0
4 2013-06 1 2013 01JUN2013 0
4 2014-07 1 2014 01JUL2014 0
5 2011-03 0 2011 01MAR2011 1
5 2012-05 0 2012 01MAY2012 1
5 2013-08 4 2013 01AUG2013 1
5 2014-02 0 2013 01FEB2014 1

 

Please let me know at your earliest convenience.

righcoastmike
Quartz | Level 8

Hi @novinosrin 

 

Ok we're getting somewhere now, the latest code made it so the hash and the sql are producing the same results, they aren't quite correct but at least it could help to identify the problem. Here is a complete example of a study id that has been misidentified as a 1 when it should be a zero: 

 

data have;
input studyID date $ ed_visits year ;
md=input(compress(date,'-'),yymmn6.); /*notice here the converted SAS date variable*/
format md date9.;
cards; 
1 2011-04 0 2011
1 2011-05 0 2011
1 2011-06 0 2011
1 2011-07 0 2011
1 2011-08 0 2011
1 2011-09 0 2011
1 2011-10 0 2011
1 2011-11 0 2011
1 2011-12 0 2011
1 2012-01 0 2012
1 2012-02 0 2012
1 2012-03 2 2012
1 2012-04 0 2012
1 2012-05 0 2012
1 2012-06 0 2012
1 2012-07 0 2012
1 2012-08 0 2012
1 2012-09 0 2012
1 2012-10 0 2012
1 2012-11 0 2012
1 2012-12 0 2012
1 2013-01 0 2013
1 2013-02 0 2013
1 2013-03 0 2013
1 2013-04 0 2013
1 2013-05 0 2013
1 2013-06 0 2013
run;

  I ran the code on just this studyid alone, and the misclassification persists. The only thing I can think of is that the fact that there are 27 records associated with the one studyid is throwing it off. 

 

Thoughts? 

 

Mike 

righcoastmike
Quartz | Level 8

sorry for the weird code, I'm not sure what happened there. 

here's a better version. 

 

mike

 

data have;
input studyID date $ ed_visits year ;
md=input(compress(date,'-'),yymmn6.); /*notice here the converted SAS date variable*/
format md date9.;
cards; 
1 2011-04 0 2011
1 2011-05 0 2011
1 2011-06 0 2011
1 2011-07 0 2011
1 2011-08 0 2011
1 2011-09 0 2011
1 2011-10 0 2011
1 2011-11 0 2011
1 2011-12 0 2011
1 2012-01 0 2012
1 2012-02 0 2012
1 2012-03 2 2012
1 2012-04 0 2012
1 2012-05 0 2012
1 2012-06 0 2012
1 2012-07 0 2012
1 2012-08 0 2012
1 2012-09 0 2012
1 2012-10 0 2012
1 2012-11 0 2012
1 2012-12 0 2012
1 2013-01 0 2013
1 2013-02 0 2013
1 2013-03 0 2013
1 2013-04 0 2013
1 2013-05 0 2013
1 2013-06 0 2013 
run;
novinosrin
Tourmaline | Level 20

Ok i think I am able to get the sense of the requirement after all. Let me do some due diligence and come back. Thank you again

novinosrin
Tourmaline | Level 20

Hi @righcoastmike  Just before hitting the bed, I think I made it. Well I truly and sincerely hope. Basically the problem was I didn't understand the requirement thoroughly and hence the ding dong(back and forth) but the coding part only took about 10-12 mins . 🙂

I guess I will have to improve my reading comprehension skills in the first place.

 

Now, it's time for you to test and let me at the earliest and I wanna party

 

 

data have;
input studyID date $ ed_visits year ;
md=input(compress(date,'-'),yymmn6.); /*notice here the converted SAS date variable*/
format md date9.;
cards; 
1 2011-04 0 2011
1 2011-05 0 2011
1 2011-06 0 2011
1 2011-07 0 2011
1 2011-08 0 2011
1 2011-09 0 2011
1 2011-10 0 2011
1 2011-11 0 2011
1 2011-12 0 2011
1 2012-01 0 2012
1 2012-02 0 2012
1 2012-03 2 2012
1 2012-04 0 2012
1 2012-05 0 2012
1 2012-06 0 2012
1 2012-07 0 2012
1 2012-08 0 2012
1 2012-09 0 2012
1 2012-10 0 2012
1 2012-11 0 2012
1 2012-12 0 2012
1 2013-01 0 2013
1 2013-02 0 2013
1 2013-03 0 2013
1 2013-04 0 2013
1 2013-05 0 2013
1 2013-06 0 2013 
;
run;

data have;
input studyID date $ ed_visits year ;
md=input(compress(date,'-'),yymmn6.); /*notice here the converted SAS date variable*/
format md date9.;
cards;
1 2011-04 0 2011
1 2011-05 1 2011
1 2012-01 3 2012
1 2012-04 0 2012
1 2013-05 2 2013
1 2013-06 0 2013
2 2011-03 4 2011
3 2011-04 2 2011
3 2011-06 0 2011
3 2013-08 2 2013
4 2011-04 1 2011
4 2012-05 1 2012
4 2013-06 1 2013
4 2014-07 1 2014
5 2011-03 0 2011
5 2012-05 0 2012
5 2013-08 4 2013
5 2014-02 0 2013
run;


data want;
if _n_=1 then do;
	if 0 then set have;
   dcl hash H (dataset:'have',ordered: "A",multidata:'y') ;
   h.definekey  ("studyID") ;
   h.definedata ("md","ed_visits") ;
   h.definedone () ;
end;
k=0;
do until(last.studyid);
set have(rename=(md=_md ed_visits=_ed_visits));
by studyID;
k=0;
if _md>j then do;
 	j=intnx('month',_md,12);
	put j= _md=;
	do rc=h.find() by 0 while(rc=0);
	if _md<=md<=j then do; put md= ed_visits=; k+ed_visits; end;
	rc=h.find_next();
	end;
	end;
	if k>=3 then repeat_user=1;
end;
do until(last.studyid);
set have;
by studyID; 
repeat_user=repeat_user=1;
output;
end;
keep  studyid date  ed_visits year repeat_user ;
run;

 

 

And do not worry. We're super close. Chill and will look forward to you earnest communication. Good night!

 

 

righcoastmike
Quartz | Level 8

Good morning @novinosrin 

 

I ran the new code as soon as I woke up this morning, and I've got good news and bad news. 

 

The good news is the code works. I tried it on several smaller datasets and it looks like it's doing exactly what we need it to do. Hurray! 

 

The bad news is that when I try to run it on the full dataset I keep getting a "SAS has disconnected from the server" error. I did a little bit of research and it looks like it might be because the code generates a very large log file. Apparently this can happen in SAS enterprise guide (which is what I'm using).

 

here's what I've tested so far

100 000 observations - works great

200 000 observations - works but I get a warning that the log is to large

500 000 observations - doesn't run and disconnects from the server

 

I'm in the process of coming up with a workaround, but as always, any thoughts are much appreciated. 

 

I'll keep you posted if I find anything

 

Mike

 

Mike 

 

righcoastmike
Quartz | Level 8
I think I may have it solved.....running final checks now.
novinosrin
Tourmaline | Level 20

Hi @righcoastmike  I was thinking you may have removed the "put statements" that i used for debugging. I was having my laptop leaning in my bed and soon as I messaged you, I fell asleep. So I didn't post a clean code as I was tired. Anyyays

 

Here is a cleaned code-->

 


data have;
input studyID date $ ed_visits year ;
md=input(compress(date,'-'),yymmn6.); /*notice here the converted SAS date variable*/
format md date9.;
cards;
1 2011-04 0 2011
1 2011-05 1 2011
1 2012-01 3 2012
1 2012-04 0 2012
1 2013-05 2 2013
1 2013-06 0 2013
2 2011-03 4 2011
3 2011-04 2 2011
3 2011-06 0 2011
3 2013-08 2 2013
4 2011-04 1 2011
4 2012-05 1 2012
4 2013-06 1 2013
4 2014-07 1 2014
5 2011-03 0 2011
5 2012-05 0 2012
5 2013-08 4 2013
5 2014-02 0 2013
run;
data want;
if _n_=1 then do;
	if 0 then set have;
   dcl hash H (dataset:'have',ordered: "A",multidata:'y') ;
   h.definekey  ("studyID") ;
   h.definedata ("md","ed_visits") ;
   h.definedone () ;
end;
k=0;
do until(last.studyid);
set have(rename=(md=_md ed_visits=_ed_visits));
by studyID;
k=0;
if _md>j then do;
 	j=intnx('month',_md,12);
	do rc=h.find() by 0 while(rc=0);
	if _md<=md<=j then  k+ed_visits; 
	rc=h.find_next();
	end;
	end;
	if k>=3 then repeat_user=1;
end;
do until(last.studyid);
set have;
by studyID; 
repeat_user=repeat_user=1;
output;
end;
keep  studyid date  ed_visits year repeat_user ;
run;

Let me know 

righcoastmike
Quartz | Level 8

We did it! The code works perfectly. 

 

Thanks so much @novinosrin, if we ever cross paths I owe you a drink :-).

 

I hope that one day I can get good enough at coding to help other people in the same way you helped me. 

 

Thanks again! 

Mike

 

FYI for others looking to do something similar: 

 

This code does exactly what we need. The only issue that I ran into with it is that it creates a very large log file when dealing with many records, and can cause SAS EG to disconnect from the server. By adding the following line at the top of the code @novinosrin provided I redirected the log to my desktop and the problem was solved!  : 

 

proc printto log="c:\path-on-server\EG.log"; run;

 

novinosrin
Tourmaline | Level 20

Hi @righcoastmike 

 

I am pleased. Thank you so much. I might be hung over tomorrow lol

 

And feel free to reach out anytime. We are all a SAS family 

PaigeMiller
Diamond | Level 26

@righcoastmike wrote:

sorry for the weird code, I'm not sure what happened there. 

here's a better version. 

 

mike

 

data have;
input studyID date $ ed_visits year ;
md=input(compress(date,'-'),yymmn6.); /*notice here the converted SAS date variable*/
format md date9.;
cards; 
1 2011-04 0 2011
1 2011-05 0 2011
1 2011-06 0 2011
1 2011-07 0 2011
1 2011-08 0 2011
1 2011-09 0 2011
1 2011-10 0 2011
1 2011-11 0 2011
1 2011-12 0 2011
1 2012-01 0 2012
1 2012-02 0 2012
1 2012-03 2 2012
1 2012-04 0 2012
1 2012-05 0 2012
1 2012-06 0 2012
1 2012-07 0 2012
1 2012-08 0 2012
1 2012-09 0 2012
1 2012-10 0 2012
1 2012-11 0 2012
1 2012-12 0 2012
1 2013-01 0 2013
1 2013-02 0 2013
1 2013-03 0 2013
1 2013-04 0 2013
1 2013-05 0 2013
1 2013-06 0 2013 
run;

Now I'm confused. You have all these consecutive months, which is good, but only 2 hospital visits in the entire time. So none of these records would be flagged, because at no time the 12-month running total is 3 or more. Am I understanding this properly?

 

Can we get a more realistic example? Can we get an example where some patients actually do visit 3 times in a 12 month time period, and some patients do not visit 3 times in a 12 month time period?

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 27 replies
  • 1428 views
  • 1 like
  • 4 in conversation