Hello,
Thanks a lot in advance! Any suggestion would be greatly appreciated.
I have a dataset with multiple observations per Client ID and have to select just the clients that meet the following criteria:
Client ID | Category | Date |
001 | initial interview | 02/01/2016 |
001 | service delivered | 07/01/2016 |
001 | re-interview | 08/01/2016 |
001 | service delivered | 10/01/2016 |
001 | re-interview | 01/01/2017 |
001 | service delivered | 03/01/2017 |
001 | re-interview | 08/01/2017 |
002 | initial interview | 02/01/2016 |
002 | service delivered | 03/01/2016 |
002 | re-interview | 08/01/2016 |
002 | service delivered | 10/01/2016 |
002 | re-interview | 01/01/2017 |
002 | service delivered | 03/01/2017 |
002 | re-interview | 08/01/2017 |
003 | initial interview | 02/01/2016 |
003 | re-interview | 08/01/2016 |
003 | service delivered | 10/01/2016 |
003 | re-interview | 01/01/2017 |
003 | service delivered | 03/01/2017 |
003 | re-interview | 08/01/2017 |
004 | initial interview | 02/01/2016 |
004 | service delivered | 03/01/2016 |
004 | re-interview | 08/01/2016 |
004 | service delivered | 10/01/2016 |
004 | re-interview | 01/01/2017 |
004 | service delivered | 03/01/2017 |
004 | re-interview | 08/01/2017 |
004 | service delivered | 09/01/2017 |
004 | re-interview | 12/01/2017 |
The only client meeting the criteria is client 002 and part of client 004. Because:
The output should be:
Client ID | Category | Date |
002 | initial interview | 02/01/2016 |
002 | service delivered | 03/01/2016 |
002 | re-interview | 08/01/2016 |
002 | service delivered | 10/01/2016 |
002 | re-interview | 01/01/2017 |
002 | service delivered | 03/01/2017 |
002 | re-interview | 08/01/2017 |
004 | initial interview | 02/01/2016 |
004 | service delivered | 03/01/2016 |
004 | re-interview | 08/01/2016 |
004 | service delivered | 10/01/2016 |
004 | re-interview | 01/01/2017 |
004 | service delivered | 03/01/2017 |
004 | re-interview | 08/01/2017 |
004 | service delivered | 09/01/2017 |
In the code, below, I incorporated the same thing as @PGStats suggested for his code:
data client_engage_3 (keep=ClientID category date); set client_engage_2; by ClientID; array reint(999) _temporary_; retain criterion1 criterion2 reint:; last_category=lag(category); last_date=lag(date); if first.ClientID then do; call missing(criterion1); call missing(criterion2); call missing(last_category); call missing(last_date); counter1=0; counter2=0; end; if lowcase(first(left(category))) eq 'i' then criterion1=date; else if not missing(criterion1) then do; if missing(criterion2) and lowcase(first(left(category))) eq 's' and date le intnx('month',criterion1,6,'sameday') then criterion2=date; else if lowcase(first(left(last_category))) eq 'r' and lowcase(first(left(category))) eq 's' then do; counter1+1; reint(counter1)=last_date; end; else if lowcase(first(left(category))) eq 'r' and counter1 ge 2 then do; counter1+1; reint(counter1)=date; end; else if lowcase(first(left(category))) eq 'r' and lowcase(first(left(last_category))) eq 'r' then do; call missing(criterion1); call missing(criterion2); counter1=0; end; end; if last.ClientID and criterion1 and criterion2 and counter1 ge 3 then do; counter2=counter1; do i=1 to counter1-1; if i eq 1 and not(intnx('month',criterion1,5,'sameday')<= reint(i)<=intnx('month',criterion1,7,'sameday')) then counter2=0; if not(intnx('month',reint(i),5,'sameday')<= reint(i+1)<=intnx('month',reint(i),7,'sameday')) then counter2=counter2-1; end; end; if counter2 ge 3; run;
Let me/us know if that works correctly with your real data.
Art, CEO, AnalystFinder.com
How about putting in a little more effort and generating SAS code for data set generation. And possibly, in addition to text describing record types, you might also include code numbers or letters.
Hi @mkeintz,
Thanks for your feedback. I have been able to create a flag for the re-interviews that are 5-7 months from the previous. The main issue that I am having is using SAS to output just the client IDs that have a re-interview following by another with at least one services in between.
Thanks
your data and questions are confusing to me
you said
The only client meeting the criteria is client 002 and part of client 004. Because:
but 002 also has same issue
002 |
initial interview |
02/01/2016 |
002 |
service delivered |
03/01/2016 |
002 |
re-interview |
08/01/2016 |
002 |
service delivered |
10/01/2016 |
Hi @kiranv_ OP's question is fairly clear as far as client 002 is concerned.I had a look at this question when i was about to leave my college lab and it's pretty straight forward except the part qualification(eg client 004). However since my own non sas assignments in school that's due in for next week is in a mess, I am afraid I didn't bother though excited.
Anyway, to clarify client 002
Client 002 had the first aka initial interview on 02/01/2016
@Asileon What i wanna you to clarify is- "Part of an id that qualifies as in the case of 004"
1. If i were to pick only those id's that qualify on the basis of satisfying all the conditions, the "part of client004" concerns me, that is it means like pick the obs in all ids that qualify and leave the rest?????? or is it pick only those id's that satisfy "all" conditions
2.Are you using SAS 9.4, this is important to me for me to attempt using more short cuts
3. I am doing you a favor by creating data have as follows:
data have;
input ClientID Category & $20. Date : mmddyy10.;
format date mmddyy10.;
datalines;
1 initial interview 2/1/2016
1 service delivered 7/1/2016
1 re-interview 8/1/2016
1 service delivered 10/1/2016
1 re-interview 1/1/2017
1 service delivered 3/1/2017
1 re-interview 8/1/2017
2 initial interview 2/1/2016
2 service delivered 3/1/2016
2 re-interview 8/1/2016
2 service delivered 10/1/2016
2 re-interview 1/1/2017
2 service delivered 3/1/2017
2 re-interview 8/1/2017
3 initial interview 2/1/2016
3 re-interview 8/1/2016
3 service delivered 10/1/2016
3 re-interview 1/1/2017
3 service delivered 3/1/2017
3 re-interview 8/1/2017
4 initial interview 2/1/2016
4 service delivered 3/1/2016
4 re-interview 8/1/2016
4 service delivered 10/1/2016
4 re-interview 1/1/2017
4 service delivered 3/1/2017
4 re-interview 8/1/2017
4 service delivered 9/1/2017
4 re-interview 12/1/2017
;
PS the bold is just make it look distinct and neat and not to shout 🙂
Hi @novinosrin,
Thanks for the feedback!
Answering your questions:
@Asileon What i wanna you to clarify is- "Part of an id that qualifies as in the case of 004"
1. If i were to pick only those id's that qualify on the basis of satisfying all the conditions, the "part of client004" concerns me, that is it means like pick the obs in all ids that qualify and leave the rest?????? or is it pick only those id's that satisfy "all" conditions
Answer: I want to pick only those id's that satisfy "all" conditions, be able to create a dataset with only those id's, and have the qualifying observations in that dataset. In the example I posted, the dataset will have:
- client 002, the initial interview, the service(s) within the 6 months after initial interview, the 3 re-interview 5-7 months from the previous, and the service(s) delivered in between the re-interviews.
- Client 004, and the same as above.
2.Are you using SAS 9.4, this is important to me for me to attempt using more short cuts
Answer: Yes, I am using 9.4
3. I am doing you a favor by creating data have as follows:
Answer: Thank you. This is my first posting ever, and I was so focus in creating a good and well explained example of my issue that I forgot. I will try this does not happen in the future.
Hi @kiranv_
Client 002 qualify because this client had a service delivered on 03/01/2016 which is within 6 months of the initial interview (one of the criteria). The client will continue to have services even after 6 months of the initial interview, and that is why you see another service 6 months past the initial interview.
002 | initial interview | 02/01/2016 |
002 | service delivered | 03/01/2016 |
002 | re-interview | 08/01/2016 |
002 | service delivered | 10/01/2016 |
Thanks
for the first criteria 001 also qualifies right because 07/01/2016 also within 6 months and this where I got confused. I am sorry, if this does not make sense.
001 |
initial interview |
02/01/2016 |
001 |
service delivered |
07/01/2016 |
001 |
re-interview |
08/01/2016 |
001 |
service delivered |
10/01/2016 |
Thank you for the correction, you are right! I apologize, I was focused in including a possible qualify/ no-qualify situation for each client, and I did not do a good job with client 001. Please find below a fix for this, where client 001 DO NOT QUALIFY based on what I said before:
data have;
input ClientID Category & $20. Date : mmddyy10.;
format date mmddyy10.;
datalines;1 initial interview 2/1/2016
1 service delivered 8/25/2016
1 re-interview 8/26/2016
1 service delivered 10/1/2016
1 re-interview 1/1/2017
1 service delivered 3/1/2017
1 re-interview 8/1/2017
2 initial interview 2/1/2016
2 service delivered 3/1/2016
2 re-interview 8/1/2016
2 service delivered 10/1/2016
2 re-interview 1/1/2017
2 service delivered 3/1/2017
2 re-interview 8/1/2017
3 initial interview 2/1/2016
3 re-interview 8/1/2016
3 service delivered 10/1/2016
3 re-interview 1/1/2017
3 service delivered 3/1/2017
3 re-interview 8/1/2017
4 initial interview 2/1/2016
4 service delivered 3/1/2016
4 re-interview 8/1/2016
4 service delivered 10/1/2016
4 re-interview 1/1/2017
4 service delivered 3/1/2017
4 re-interview 8/1/2017
4 service delivered 9/1/2017
4 re-interview 12/1/2017;
Thanks again!
@kiranv_ Please see response above, Thanks
In your example, all dates are on the first day of the month. Do the dates represent months?
Hi @PGStats,
The original dataset has millions of observations, and in this example I used imaginary data. This dates represent month/day/year - I used 01 as the day for all of the dates because I tried to make it visually easy for the reader, I was attempting that the reader was able to see the time difference between dates without having to calculate too much.
Thanks
Good morning from Chicago, Welcome to the SAS forum and thank you for the steadfast communication in clarifying. I'll be right back hopefully with the code or perhaps for some more clarifications.
I question your logic regarding client 4. They had already met the criteria by the time they had their 3rd re-interview.
If that is correct, then the following code (I think) captures all of your criteria:
data have; infile datalines dlm=','; informat ClientID 8.; informat category $20.; informat date mmddyy10.; input ClientID Category Date; format date mmddyy10.; datalines; 1,initial interview,2/1/2016 1,service delivered,8/25/2016 1,re-interview,8/26/2016 1,service delivered,10/1/2016 1,re-interview,1/1/2017 1,service delivered,3/1/2017 1,re-interview,8/1/2017 2,initial interview,2/1/2016 2,service delivered,3/1/2016 2,re-interview,8/1/2016 2,service delivered,10/1/2016 2,re-interview,1/1/2017 2,service delivered,3/1/2017 2,re-interview,8/1/2017 3,initial interview,2/1/2016 3,re-interview,8/1/2016 3,service delivered,10/1/2016 3,re-interview,1/1/2017 3,service delivered,3/1/2017 3,re-interview,8/1/2017 4,initial interview,2/1/2016 4,service delivered,3/1/2016 4,re-interview,8/1/2016 4,service delivered,10/1/2016 4,re-interview,1/1/2017 4,service delivered,3/1/2017 4,re-interview,8/1/2017 4,service delivered,9/1/2017 4,re-interview,12/1/2017 ; /* Criterion1: initial interview (criterion_int) Criterion2: service within 6 months of initial interview Criterion3: 3 re-interviews within 5-7 months of each other, with each of the first two re-interviews followed by a service */ data want; set have; by ClientID; array reint(10); retain criterion1 criterion2 reint:; last_category=lag(category); last_date=lag(date); if first.ClientID then do; call missing(criterion1); call missing(criterion2); call missing(last_category); call missing(last_date); counter1=0; counter2=0; end; if category eq 'initial interview' then criterion1=date; else if not missing(criterion1) then do; if missing(criterion2) and category eq 'service delivered' and date le intnx('month',criterion1,6,'sameday') then criterion2=date; else if last_category eq 're-interview' and category eq 'service delivered' then do; counter1+1; reint(counter1)=last_date; end; else if category eq 're-interview' and counter1 ge 2 then do; counter1+1; reint(counter1)=date; end; end; if last.ClientID and criterion1 and criterion2 and counter1 ge 3 then do; counter2=counter1; do i=1 to counter1-1; if not(intnx('month',reint(i),5,'sameday')<= reint(i+1)<=intnx('month',reint(i),7,'sameday')) then counter2=counter2-1; end; end; if counter2 ge 3; run;
Art, CEO, AnalystFinder.com
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.