BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
righcoastmike
Quartz | Level 8

Hi folks, 

 

I have a dataset that looks like this: 

 

data have;
input studyID mth_date ed_visits year ;
cards;
1 2011-04 0 2011
1 2011-05 1 2011
1 2012-01 3 2012
2 2011-03 4 2011
3 2011-04 2 2011
3 2011-06 0 2011
3 2013-08 2 2013
run;

And I'm trying to create a dataset that will tell me which of the study id's have 3 or more ed_visits within 12 months ( we'll call them repeat users). The trick is that I don't care if it's 3 hospitalizations in the same year, just that within any 12 month period they have a total of 3 or more hospitalizations. So Using the above dataset, and creating a binary variable for "repeat user" my output could look something like:

 

data want;
input studyID mth_date ed_visits year Repeat_user ;
cards;
1 2011-04 0 2011 1
1 2011-05 1 2011 1
1 2012-01 3 2012 1
2 2011-03 4 2011 1
3 2011-04 2 2011 0
3 2011-06 0 2011 0
3 2013-08 2 2013 0
run;

 

Study ID's 1 and 2 both have enough hospitalizations to qualify as repeat users (value of 1),  but #3 never had more than 2 visits within a 12 month period, so they remain 0's.

 

Alternatively,  a list of the study ID's that meet the "repeat user" criteria would also work.  

 

Also, just FYI the mnth_date variable is currently not a SAS date, just a string (not my fault that's just how the data came). 

 

In essence I need to be able to sort out the people who made 3 or more trips to the ED within 12 months over the course of my study period  from those that haven't 

 

Any thoughts would be much appreciated! 

 

Thanks so much

 

Mike

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @righcoastmike  I was thinking you may have removed the "put statements" that i used for debugging. I was having my laptop leaning in my bed and soon as I messaged you, I fell asleep. So I didn't post a clean code as I was tired. Anyyays

 

Here is a cleaned code-->

 


data have;
input studyID date $ ed_visits year ;
md=input(compress(date,'-'),yymmn6.); /*notice here the converted SAS date variable*/
format md date9.;
cards;
1 2011-04 0 2011
1 2011-05 1 2011
1 2012-01 3 2012
1 2012-04 0 2012
1 2013-05 2 2013
1 2013-06 0 2013
2 2011-03 4 2011
3 2011-04 2 2011
3 2011-06 0 2011
3 2013-08 2 2013
4 2011-04 1 2011
4 2012-05 1 2012
4 2013-06 1 2013
4 2014-07 1 2014
5 2011-03 0 2011
5 2012-05 0 2012
5 2013-08 4 2013
5 2014-02 0 2013
run;
data want;
if _n_=1 then do;
	if 0 then set have;
   dcl hash H (dataset:'have',ordered: "A",multidata:'y') ;
   h.definekey  ("studyID") ;
   h.definedata ("md","ed_visits") ;
   h.definedone () ;
end;
k=0;
do until(last.studyid);
set have(rename=(md=_md ed_visits=_ed_visits));
by studyID;
k=0;
if _md>j then do;
 	j=intnx('month',_md,12);
	do rc=h.find() by 0 while(rc=0);
	if _md<=md<=j then  k+ed_visits; 
	rc=h.find_next();
	end;
	end;
	if k>=3 then repeat_user=1;
end;
do until(last.studyid);
set have;
by studyID; 
repeat_user=repeat_user=1;
output;
end;
keep  studyid date  ed_visits year repeat_user ;
run;

Let me know 

View solution in original post

27 REPLIES 27
PaigeMiller
Diamond | Level 26

PROC EXPAND will do this, it can count the sum of the ed_visits over a rolling X month time period. Once you have the rolling sum, you can create a binary variable.

 

As a hint to help this work, you are going to need to make sure your mth_date is an actual SAS date value. (As it stands, your input statement will fail to read 2011-04, for example. Perhaps there is an informat that will allow mth_date to be read as an actual SAS date value). Once you straighten that out, you can add in zeros for months that don't appear in data set HAVE, so that PROC EXPAND can work on 12 consecutive months.

 

 

--
Paige Miller
ballardw
Super User

@PaigeMiller wrote:

PROC EXPAND will do this, it can count the sum of the ed_visits over a rolling X month time period. Once you have the rolling sum, you can create a binary variable.

 

As a hint to help this work, you are going to need to make sure your mth_date is an actual SAS date value. (As it stands, your input statement will fail to read 2011-04, for example. Perhaps there is an informat that will allow mth_date to be read as an actual SAS date value). Once you straighten that out, you can add in zeros for months that don't appear in data set HAVE, so that PROC EXPAND can work on 12 consecutive months.

 

 


The anydtdte informat will read those values as the first of the month:

data have;
input studyID mth_date :anydtdte10. ed_visits year ;
format mth_date date9.;
cards;
1 2011-04 0 2011
1 2011-05 1 2011
1 2012-01 3 2012
2 2011-03 4 2011
3 2011-04 2 2011
3 2011-06 0 2011
3 2013-08 2 2013
run;
righcoastmike
Quartz | Level 8

Hi folks, 

 

I'm still working on the main problem, but fyi here is the solution that worked for fixing the date variable...... (current format is a string that looks like '2011-04')

 

data want;
set have;
format date2 date9.;
date2 = input(mthdate,anydtdte10.);
run;

where date2 is the corrected date variable, and mthdate is the original string variable. 

 

hoping it can help someone else in a similar situation down the line, will report back when I get the rest sorted out. 

 

Thanks so much! 

 

Mike

 

novinosrin
Tourmaline | Level 20

Hi @righcoastmike   See if this works. If my understanding is correct, proc sql is rather easier. I fancied a DOW Hash for FWIW. 

 

data have;
input studyID mth_date $ ed_visits year ;
md=input(compress(mth_date,'-'),yymmn6.);
format md date9.;
cards;
1 2011-04 0 2011
1 2011-05 1 2011
1 2012-01 3 2012
2 2011-03 4 2011
3 2011-04 2 2011
3 2011-06 0 2011
3 2013-08 2 2013
;
run;


data want;
if _n_=1 then do;
	if 0 then set have;
   dcl hash H (dataset:'have',ordered: "A",multidata:'y') ;
   h.definekey  ("studyID") ;
   h.definedata ("md","ed_visits") ;
   h.definedone () ;
end;
_k=0;
do until(last.studyid);
set have(rename=(md=_md ed_visits=_ed_visits));
by studyID;
do rc=h.find() by 0 while(rc=0);
if md>_md and intck('month',_md,md)<=12 then k=sum(ed_visits,_ed_visits);
rc=h.find_next();
end;
_k+k;
repeat_user=first.studyid and last.studyid and _ed_visits>=3  or k>=3 ;
end;
do until(last.studyid);
set have;
by studyID;
output;
end;
keep studyid mth_date ed_visits year repeat_user;
run;

 

righcoastmike
Quartz | Level 8

Thanks so much for this Novinosrin. 

 

I've given the code a try, and it looks like it's running, but it's been chugging away for about ten hours now. Any thoughts on why that might be? 

 

Mike 

 

 

novinosrin
Tourmaline | Level 20

How large is your dataset? I hope there isn't any memory constraints

 

I ran against your sample and I trust that's a representative one. Nonetheless, I can work on a SQL solution as a parallel alternative if needed. Keep us posted. 

 

 

righcoastmike
Quartz | Level 8
It's a biggie I'm afraid. (approx, 5.6 million records total for approx 104,000 individuals). The sample is representative of the data format, but clearly not the size. Re: memory constraints I'm checking with the administrator now. My guess is that's what the issue is. would Proc SQL help to take some of the load off?

Thanks so much for working through this with me. It's much appreciated.
novinosrin
Tourmaline | Level 20

Just 5.6 million records ?

 

Jeez, that's tiny. I was thinking a few 100 millions. Are you saying it's been running for 10 hours?Hmm that can't be true.

 

Did you paste the code correctly checking word by word before you started executing?

 

Did you test on a smaller sample to begin with and see if that works?

 

 

 

Anyways, I will test against 100 million in a bit on my machine and respond here but I don't think that will have any issues here. Nonetheless, I will be right back with SQL solution shortly after a coffee

PaigeMiller
Diamond | Level 26

Why use SQL or hash, when SAS has already programmed exactly what you need in PROC EXPAND? 

 

(One possible reason is that your SAS license does not contain PROC EXPAND, but this hasn't been stated by the OP)

--
Paige Miller
novinosrin
Tourmaline | Level 20

Sir @PaigeMiller  Do you ever doubt I would disagree to your wise advice. If yes, change that. 🙂

 

Ok, I haven't used ETS much so I don't know much about it. 2nd, I am attempting with what I know. When all we have is a hammer, everything looks like a nail.

righcoastmike
Quartz | Level 8

Hi @PaigeMiller 

 

Thanks so much for this suggestion. I've managed to get the date value to where it needs to be, and I've been playing with proc expand.

 The problem that I'm running into is that proc expand apparently doesn't like it when you have 2 identical id variables (time variables) right after each other and throws up an error. 

 

Since I have multiple records taken from different subjects on the same date, I can't figure out how to make it work. 

 

proc sort data=have;
by date;
run; 

proc expand data=have;
out=want
to=month
method=none;
id date;
run;

 This gives me an "observation with duplicate ID value found" error.

 

Any thoughts would be much appreciated. 

 

Thanks so much

 

Mike 

PaigeMiller
Diamond | Level 26

As I re-read my original suggestion, its not clear why you have 2 identical ID variables right after each other, and this was not something I envisioned. Explain further how this happens, and please show us a small example.

--
Paige Miller
novinosrin
Tourmaline | Level 20

Hi @righcoastmike   here is a SQL solution. I have added some comments so that you can follow the logic/process. Test with a smaller sample, if things go well , scale up.

 


/*Your Original sample HAVE*/
data have;
input studyID mth_date $ ed_visits year ;
cards;
1 2011-04 0 2011
1 2011-05 1 2011
1 2012-01 3 2012
2 2011-03 4 2011
3 2011-04 2 2011
3 2011-06 0 2011
3 2013-08 2 2013
;
run;

/*Creating a temp dataset where we extract date numeric values from your char dates in the form
2011-04*/
/*This md=input(compress(mth_date,'-'),yymmn6.) does that */
data temp;
set have;
md=input(compress(mth_date,'-'),yymmn6.);
run;

/*Taking the previous temp table for our look ahead process to get our final want*/

proc sql;
create table want(drop=md) as
select distinct studyID,mth_date ,md, ed_visits, year, sum(t)>=3 as repeat_user
from
(select a.*,ifn(count(a.studyid)=1, a.ed_visits, (intck('month',a.md,b.md)<=12)*(b.ed_visits+a.ed_visits)) as t
from temp a left join temp b
on a.studyid=b.studyid and b.md>a.md
group by a.studyid)
group by studyid
order by studyid,md;
quit;

 

I used some boolean expressions. In essence the 2 select passes can still be made into one with more boolean games, however my concern is I want us to make it work to achieve a working solution first and once that's done, I will have some fun later.

righcoastmike
Quartz | Level 8

Hi @novinosrin 

 

Ok here's the update, i have managed to get the hash version mostly working, but it's missing some people. From what I can tell it looks as if there are a series of 1's and 2's in "ed_visits" it adds them up just fine and makes the appropriate flag. If there is a record with more than 2 it seems to miss it. For example, if someone has 2 visits in 2011-04 and 2 visits in 2011-05 the code works great. If however, the person has 3 visits in 2011-04 (which still counts as a repeat user) it misses them.

 

I'm still working on getting the SQL up and running, I have a working code but I think I might have messed something up because it's doing the opposite of the hash, it's catching way more than it should. 

 

Any thoughts on either of those would be much appreciated. 

 

Thanks so much

 

Mike 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 27 replies
  • 1427 views
  • 1 like
  • 4 in conversation