Hello @art297,
Thanks a lot for your code!
That is correct, client 004 met the criteria by the third re-interview; all the clients will continue to have services and re-interviews, but I am interested in the initial interview, the following 3 re-interviews and the services within the initial interview/re-interviews.
I will apply your code to my dataset, and let you know how it goes.
Thanks!!
One slight (?) discrepancy between your current description and the criteria you originally specified. You said, in this post, that "I am interested in the initial interview, the following 3 re-interviews and the services within the initial interview/re-interviews."
However, nothing in your criteria stipulated that the re-interviews had to be done either initially, or concurrent.
As such, the following ClientID would meet your original criteria AND would be selected by my code:
data have; infile datalines dlm=','; informat ClientID 8.; informat category $20.; informat date mmddyy10.; input ClientID Category Date; format date mmddyy10.; datalines; 8,initial interview,2/29/2016 8,service delivered,8/29/2016 8,re-interview,9/3/2016 8,re-interview,10/3/2016 8,re-interview,11/3/2016 8,re-interview,12/3/2016 8,service delivered,12/4/2016 8,re-interview,6/3/2017 8,service delivered,6/5/2017 8,re-interview,12/2/2017 ;
Art, CEO, AnalystFinder.com
Thanks again for the feedback!
I am sorry if I created confusion with my current description; I was trying to summarize - not a good idea 🙂
Client 008 in the example you provided would not me one of the criteria (please see below in bold) and therefore would not be selected.
This is the criteria:
I have a dataset with multiple observations per Client ID and have to select just the clients that meet the following criteria:
data have; infile datalines dlm=','; informat ClientID 8.; informat category $20.; informat date mmddyy10.; input ClientID Category Date; format date mmddyy10.; datalines; 8,initial interview,2/29/2016 8,service delivered,8/29/2016 8,re-interview,9/3/2016
/* Two consecutive re-interviews with no service in between, and the y are not 5-7 months apart */ 8,re-interview,10/3/2016
Thanks!
I think that the expanded criteria shown in the following revised code meets your latest criterion:
data have; infile datalines dlm=','; informat ClientID 8.; informat category $20.; informat date mmddyy10.; input ClientID Category Date; format date mmddyy10.; datalines; 1,initial interview,2/1/2016 1,service delivered,8/25/2016 1,re-interview,8/26/2016 1,service delivered,10/1/2016 1,re-interview,1/1/2017 1,service delivered,3/1/2017 1,re-interview,8/1/2017 2,initial interview,2/1/2016 2,service delivered,3/1/2016 2,re-interview,8/1/2016 2,service delivered,10/1/2016 2,re-interview,1/1/2017 2,service delivered,3/1/2017 2,re-interview,8/1/2017 3,initial interview,2/1/2016 3,re-interview,8/1/2016 3,service delivered,10/1/2016 3,re-interview,1/1/2017 3,service delivered,3/1/2017 3,re-interview,8/1/2017 4,initial interview,2/1/2016 4,service delivered,3/1/2016 4,re-interview,8/1/2016 4,service delivered,10/1/2016 4,re-interview,1/1/2017 4,service delivered,3/1/2017 4,re-interview,8/1/2017 4,service delivered,9/1/2017 4,re-interview,12/1/2017 5,initial interview,2/1/2016 5,service delivered,3/1/2016 5,re-interview,8/1/2016 5,service delivered,10/1/2016 5,re-interview,1/1/2017 5,service delivered,3/1/2017 5,re-interview,8/1/2017 6,initial interview,2/1/2016 6,service delivered,8/2/2016 6,re-interview,8/3/2016 6,service delivered,10/1/2016 6,re-interview,1/1/2017 6,service delivered,3/1/2017 6,re-interview,8/1/2017 7,initial interview,2/29/2016 7,service delivered,8/29/2016 7,re-interview,9/3/2016 7,service delivered,10/1/2016 7,re-interview,2/3/2017 7,service delivered,3/1/2017 7,re-interview,9/1/2017 8,initial interview,2/29/2016 8,service delivered,8/29/2016 8,re-interview,9/3/2016 8,re-interview,10/3/2016 8,re-interview,11/3/2016 8,re-interview,12/3/2016 8,service delivered,12/4/2016 8,re-interview,6/3/2017 8,service delivered,6/5/2017 8,re-interview,12/2/2017 ; /* Criterion1: initial interview (criterion_int) Criterion2: service within 6 months of initial interview Criterion3: 3 re-interviews within 5-7 months of each other, with each of the first two re-interviews followed by a service Criterion4: All re-interviews, except for the final re-interview, must be followed by a service */ data want; set have; by ClientID; array reint(10); retain criterion1 criterion2 reint:; last_category=lag(category); last_date=lag(date); if first.ClientID then do; call missing(criterion1); call missing(criterion2); call missing(last_category); call missing(last_date); counter1=0; counter2=0; end; if category eq 'initial interview' then criterion1=date; else if not missing(criterion1) then do; if missing(criterion2) and category eq 'service delivered' and date le intnx('month',criterion1,6,'sameday') then criterion2=date; else if last_category eq 're-interview' and category eq 'service delivered' then do; counter1+1; reint(counter1)=last_date; end; else if category eq 're-interview' and counter1 ge 2 then do; counter1+1; reint(counter1)=date; end; else if category eq 're-interview' and last_category eq 're-interview' then do; call missing(criterion1); call missing(criterion2); counter1=0; end; end; if last.ClientID and criterion1 and criterion2 and counter1 ge 3 then do; counter2=counter1; do i=1 to counter1-1; if not(intnx('month',reint(i),5,'sameday')<= reint(i+1)<=intnx('month',reint(i),7,'sameday')) then counter2=counter2-1; end; end; if counter2 ge 3; run;
Art, CEO, AnalystFinder.com
Thanks @art297! Too bad I have to wait until Monday (due to SAS availability) to be able to apply the code to my data. I will let you know as soon as I do on Monday.
Good evening @art297, May i seek a clarification on why client8 doesn't qualify based on the data please?
I am scratching my head here
8,initial interview,2/29/2016
8,service delivered,8/29/2016 /*Interval is 6 months seems fine here as service delivered with 6 months*/
8,re-interview,9/3/2016
8,re-interview,10/3/2016
8,re-interview,11/3/2016
8,re-interview,12/3/2016 Comparing reinteviews starting from here
8,service delivered,12/4/2016
8,re-interview,6/3/2017 /*Interval is 6 months seems fine here as reinterview range seems within 5-7 months*/
8,service delivered,6/5/2017
8,re-interview,12/2/2017 /*Interval is 6 months seems fine here as reinterview range seems within 5-7 months*/
My eyes isn't spotting something. Thank you for your time
As I understand the rules, all re-interviews other than the last ClientID record must be followed by a service. ClientID 8 doesn't meet that criterion.
I don't know if @Asileon is certain of the criteria as they have expanded since the question was first asked and, in my latest post, I offered another test ClientID that may or may not require another expansion of the criteria.
However, this is typical, as most don't realize that defining a set of rules is, quite often, far more complex than one initially envisions.
Art, CEO, AnalystFinder.com
My oh My! Ah well, Sorry for the bother.Thank you
Hello @art297
Looking at the clients in your code (thanks for adding more clients!) and the criteria that I posted, these are the ones that will qualify and the ones that will not qualify:
Qualify:
Client 002
Client 004
Client 005
Client 007
Do not qualify:
Client 001: the service delivered after the initial interview did not happen within 6 months. This is the first criteria, therefore, there is not need to look for the others.
Client 003: there is not service delivered after the initial interview.
Client 006: the service delivered after the initial interview did not happen within 6 months.
Client 008: the three re-interviews following the initial interview do not have services delivered in between them. In addition those re-interviews did not happen 5-7 months from each others.
I can't wait for tomorrow to apply the code!
Thanks
data have; infile datalines dlm=','; informat ClientID 8.; informat category $20.; informat date mmddyy10.; input ClientID Category Date; format date mmddyy10.; datalines; 1,initial interview,2/1/2016 1,service delivered,8/25/2016 1,re-interview,8/26/2016 1,service delivered,10/1/2016 1,re-interview,1/1/2017 1,service delivered,3/1/2017 1,re-interview,8/1/2017 2,initial interview,2/1/2016 2,service delivered,3/1/2016 2,re-interview,8/1/2016 2,service delivered,10/1/2016 2,re-interview,1/1/2017 2,service delivered,3/1/2017 2,re-interview,8/1/2017 3,initial interview,2/1/2016 3,re-interview,8/1/2016 3,service delivered,10/1/2016 3,re-interview,1/1/2017 3,service delivered,3/1/2017 3,re-interview,8/1/2017 4,initial interview,2/1/2016 4,service delivered,3/1/2016 4,re-interview,8/1/2016 4,service delivered,10/1/2016 4,re-interview,1/1/2017 4,service delivered,3/1/2017 4,re-interview,8/1/2017 4,service delivered,9/1/2017 4,re-interview,12/1/2017 5,initial interview,2/1/2016 5,service delivered,3/1/2016 5,re-interview,8/1/2016 5,service delivered,10/1/2016 5,re-interview,1/1/2017 5,service delivered,3/1/2017 5,re-interview,8/1/2017 6,initial interview,2/1/2016 6,service delivered,8/2/2016 6,re-interview,8/3/2016 6,service delivered,10/1/2016 6,re-interview,1/1/2017 6,service delivered,3/1/2017 6,re-interview,8/1/2017 7,initial interview,2/29/2016 7,service delivered,8/29/2016 7,re-interview,9/3/2016 7,service delivered,10/1/2016 7,re-interview,2/3/2017 7,service delivered,3/1/2017 7,re-interview,9/1/2017 8,initial interview,2/29/2016 8,service delivered,8/29/2016 8,re-interview,9/3/2016 8,re-interview,10/3/2016 8,re-interview,11/3/2016 8,re-interview,12/3/2016 8,service delivered,12/4/2016 8,re-interview,6/3/2017 8,service delivered,6/5/2017 8,re-interview,12/2/2017 ; /* Criterion1: initial interview (criterion_int) Criterion2: service within 6 months of initial interview Criterion3: 3 re-interviews within 5-7 months of each other, with each of the first two re-interviews followed by a service Criterion4: All re-interviews, except for the final re-interview, must be followed by a service */
Good evening @novinosrin,
Client 008 does not qualify because the criteria:
8,initial interview,2/29/2016
8,service delivered,8/29/2016 /*Interval is 6 months seems fine here as service delivered with 6 months*/
8,re-interview,9/3/2016
8,re-interview,10/3/2016
8,re-interview,11/3/2016
Thanks
data have;
input ClientID Category & $20. Date : mmddyy10.;
format date mmddyy10.;
datalines;
1 initial interview 2/1/2016
1 service delivered 8/25/2016
1 re-interview 8/26/2016
1 service delivered 10/1/2016
1 re-interview 1/1/2017
1 service delivered 3/1/2017
1 re-interview 8/1/2017
2 initial interview 2/1/2016
2 service delivered 3/1/2016
2 re-interview 8/1/2016
2 service delivered 10/1/2016
2 re-interview 1/1/2017
2 service delivered 3/1/2017
2 re-interview 8/1/2017
3 initial interview 2/1/2016
3 re-interview 8/1/2016
3 service delivered 10/1/2016
3 re-interview 1/1/2017
3 service delivered 3/1/2017
3 re-interview 8/1/2017
4 initial interview 2/1/2016
4 service delivered 3/1/2016
4 re-interview 8/1/2016
4 service delivered 10/1/2016
4 re-interview 1/1/2017
4 service delivered 3/1/2017
4 re-interview 8/1/2017
4 service delivered 9/1/2017
4 re-interview 12/1/2017
;
/*I have a dataset with multiple observations per Client ID and have to select just the clients that meet the following criteria:
The client must have an initial interview.
The client must have at least one service
(cleaning house, delivering groceries, etc.) delivered within 6 months of the initial interview.
The client must have at least 3 re-interviews 5-7 months apart from each other.
The client must have at least one service delivered between the re-interviews.
Get a final dataset with the qualifying Client ID and only the qualifying observations.*/
data want;
if _n_=1 then do;
if 0 then set have(rename=(date=_date Category=_Category));
if 0 then set have(rename=(date=__date Category=__Category));
dcl hash h(dataset:'have(rename=(date=_date Category=_Category))', multidata: 'y');
h.definekey('ClientID','_Category');
h.definedata(all:'y');
h.definedone();
dcl hash h2(dataset:'have(rename=(date=__date Category=__Category))', multidata: 'y');
h2.definekey('ClientID');
h2.definedata(all:'y');
h2.definedone();
end;
_qual=0;
call missing(_count,_flag);
do until(last.clientid);
set have;
by ClientID;
if first.clientid and Category='initial interview' then
do;
do while(h2.do_over(key:ClientID) eq 0);
if __Category='service delivered' and intck('month',date,__date)<6 then _qual=1;
end;
end;
else if Category='re-interview' and _qual=1 then
do;
if h.find(key:clientid,key:Category)=0 then
do;
if 5<=intck('month',date,_date)<=7 then _count+1;
rc = h.find_next();
do while(rc = 0);
if 5<=intck('month',date,_date)<=7 then _count+1;
rc = h.find_next();
end;
if _count>=3 then _qual=1;
end;
end;
else if Category='service delivered' and _qual=1 then
do;
do while(h2.do_over(key:ClientID) eq 0);
if missing(_flag)and __Category='re-interview' and date>__date then _holddate1=date;
else if missing(_flag) and __Category='re-interview' and _holddate1<date then _flag=1;
end;
if _flag=1 then _qual=1;
end;
else _qual=0;
end;
do until(last.clientid);
set have;
by clientid;
if _qual then output;
end;
drop _: rc;
run;
@novinosrin: Not a good use of the hash object. Your code takes more than four times longer than my straight forward datastep to run! Plus, given the following example, your code fails to select ClientID 7.
data have; infile datalines dlm=','; informat ClientID 8.; informat category $20.; informat date mmddyy10.; input ClientID Category Date; format date mmddyy10.; datalines; 1,initial interview,2/1/2016 1,service delivered,8/25/2016 1,re-interview,8/26/2016 1,service delivered,10/1/2016 1,re-interview,1/1/2017 1,service delivered,3/1/2017 1,re-interview,8/1/2017 2,initial interview,2/1/2016 2,service delivered,3/1/2016 2,re-interview,8/1/2016 2,service delivered,10/1/2016 2,re-interview,1/1/2017 2,service delivered,3/1/2017 2,re-interview,8/1/2017 3,initial interview,2/1/2016 3,re-interview,8/1/2016 3,service delivered,10/1/2016 3,re-interview,1/1/2017 3,service delivered,3/1/2017 3,re-interview,8/1/2017 4,initial interview,2/1/2016 4,service delivered,3/1/2016 4,re-interview,8/1/2016 4,service delivered,10/1/2016 4,re-interview,1/1/2017 4,service delivered,3/1/2017 4,re-interview,8/1/2017 4,service delivered,9/1/2017 4,re-interview,12/1/2017 5,initial interview,2/1/2016 5,service delivered,3/1/2016 5,re-interview,8/1/2016 5,service delivered,10/1/2016 5,re-interview,1/1/2017 5,service delivered,3/1/2017 5,re-interview,8/1/2017 6,initial interview,2/1/2016 6,service delivered,8/2/2016 6,re-interview,8/3/2016 6,service delivered,10/1/2016 6,re-interview,1/1/2017 6,service delivered,3/1/2017 6,re-interview,8/1/2017 7,initial interview,2/29/2016 7,service delivered,8/29/2016 7,re-interview,9/3/2016 7,service delivered,10/1/2016 7,re-interview,2/3/2017 7,service delivered,3/1/2017 7,re-interview,9/1/2017 ;
Art, CEO, AnalystFinder.com
Hello @novinosrin,
Thanks for your code! I will apply it to my dataset also, and see how it goes.
Thanks!
When hall of famers + Ksharp participate just go for their solution. However, I will work on an alternative to at least satisfy myself when i get to my college lab tomorrow. This forum is very addictive and yet because of whatever other factors my brain didn't seem to focus well enough today. Hmm, one of those days 😞 lol
I made a mistake in loop (do over associative array) too many times and that caused slow execution. I can make it to zip through a key only once and will find ways to match your requirement. Well, after all at least I try and proud to be a contributor here
@novinosrin: Not good advice! We ALL make mistakes and, sometimes, fresh eyes find the best solution. Don't assume that Hall of Famers, and/or @Ksharp-like experts, always suggest the best solutions.
Art, CEO, AnalystFinder.com
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.