BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Asileon
Obsidian | Level 7

Hello,

 

Thanks a lot in advance! Any suggestion would be greatly appreciated.

 

I have a dataset with multiple observations per Client ID and have to select just the clients that meet the following criteria:

  • The client must have an initial interview.
  • The client must have at least one service (cleaning house, delivering groceries, etc.) delivered within 6 months of the initial interview.
  • The client must have at least 3 re-interviews 5-7 months apart from each other.
  • The client must have at least one service delivered between the re-interviews.
  • Get a final dataset with the qualifying Client ID and only the qualifying observations.

 

Client ID

Category

Date

001

initial interview

02/01/2016

001

service delivered

07/01/2016

001

re-interview

08/01/2016

001

service delivered

10/01/2016

001

re-interview

01/01/2017

001

service delivered

03/01/2017

001

re-interview

08/01/2017

002

initial interview

02/01/2016

002

service delivered

03/01/2016

002

re-interview

08/01/2016

002

service delivered

10/01/2016

002

re-interview

01/01/2017

002

service delivered

03/01/2017

002

re-interview

08/01/2017

003

initial interview

02/01/2016

003

re-interview

08/01/2016

003

service delivered

10/01/2016

003

re-interview

01/01/2017

003

service delivered

03/01/2017

003

re-interview

08/01/2017

004

initial interview

02/01/2016

004

service delivered

03/01/2016

004

re-interview

08/01/2016

004

service delivered

10/01/2016

004

re-interview

01/01/2017

004

service delivered

03/01/2017

004

re-interview

08/01/2017

004

service delivered

09/01/2017

004

re-interview

12/01/2017

The only client meeting the criteria is client 002 and part of client 004. Because:

  • Client 001 had a service after the initial interview that was delivered after 6 months of the initial interview.
  • Client 003 had an initial interview and then a re-interview with no service(s) in between
  • Client 004 had a re-interview (12/01/2017) that was only 4 months from the previous re-interview.

 

The output should be:

Client ID

Category

Date

002

initial interview

02/01/2016

002

service delivered

03/01/2016

002

re-interview

08/01/2016

002

service delivered

10/01/2016

002

re-interview

01/01/2017

002

service delivered

03/01/2017

002

re-interview

08/01/2017

004

initial interview

02/01/2016

004

service delivered

03/01/2016

004

re-interview

08/01/2016

004

service delivered

10/01/2016

004

re-interview

01/01/2017

004

service delivered

03/01/2017

004

re-interview

08/01/2017

004

service delivered

09/01/2017

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

In the code, below, I incorporated the same thing as @PGStats suggested for his code:

data client_engage_3 (keep=ClientID category date);
  set client_engage_2;
  by ClientID;
  array reint(999) _temporary_;
  retain criterion1 criterion2 reint:;
  last_category=lag(category);
  last_date=lag(date);
  if first.ClientID then do;
    call missing(criterion1);
    call missing(criterion2);
    call missing(last_category);
    call missing(last_date);
    counter1=0;
    counter2=0;
  end;
  if lowcase(first(left(category))) eq 'i' then criterion1=date;
  else if not missing(criterion1) then do;
    if missing(criterion2) and lowcase(first(left(category))) eq 's'
     and date le intnx('month',criterion1,6,'sameday')
     then criterion2=date;
    else if lowcase(first(left(last_category))) eq 'r' and 
        lowcase(first(left(category))) eq 's' then do;
         counter1+1;
         reint(counter1)=last_date;
    end;
    else if lowcase(first(left(category))) eq 'r' and counter1 ge 2 then do;
       counter1+1;
       reint(counter1)=date;
    end;
    else if lowcase(first(left(category))) eq 'r' and lowcase(first(left(last_category))) eq 'r' then do;
      call missing(criterion1);
      call missing(criterion2);
      counter1=0;
    end;
  end;
  if last.ClientID and criterion1 and criterion2 and counter1 ge 3 then do;
    counter2=counter1;
    do i=1 to counter1-1;
      if i eq 1 and not(intnx('month',criterion1,5,'sameday')<=
       reint(i)<=intnx('month',criterion1,7,'sameday')) then counter2=0;
      if not(intnx('month',reint(i),5,'sameday')<=
       reint(i+1)<=intnx('month',reint(i),7,'sameday')) then counter2=counter2-1;
    end;
  end;
  if counter2 ge 3;
run;

Let me/us know if that works correctly with your real data.

 

Art, CEO, AnalystFinder.com

 

View solution in original post

71 REPLIES 71
mkeintz
PROC Star

How about putting in a little more effort and generating SAS code for data set generation.  And possibly, in addition to text describing record types, you might also include code numbers or letters.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Asileon
Obsidian | Level 7

Hi @mkeintz

 

Thanks for your feedback. I have been able to create a flag for the re-interviews that are 5-7 months from the previous. The main issue that I am having is using SAS to output just the client IDs that have a re-interview following by another with at least one services in between.

 

Thanks

kiranv_
Rhodochrosite | Level 12

 

your data and questions are confusing to me

 

you said 

The only client meeting the criteria is client 002 and part of client 004. Because:

  • Client 001 had a service after the initial interview that was delivered after 6 months of the initial interview.

but 002 also has same issue

 

002

initial interview

02/01/2016

002

service delivered

03/01/2016

002

re-interview

08/01/2016

002

service delivered

10/01/2016

novinosrin
Tourmaline | Level 20

Hi @kiranv_    OP's question is fairly clear as far as client 002 is concerned.I had a look at this question when i was about to leave my college lab and it's pretty straight forward except the part qualification(eg client 004). However since my own non sas assignments in school that's due in for next week is in a mess, I am afraid I didn't bother though excited.

Anyway, to clarify client 002 

Client 002 had the first aka initial interview on 02/01/2016 

  • The client must have at least one service (cleaning house, delivering groceries, etc.) delivered within 6 months of the initial interview-- Client 002 had the first serivice post the initial interview on 03/01/2016 which is the subsequent month and that falls within th condition stated above
  • The client must have at least 3 re-interviews 5-7 months apart from each other---intck aka intevals between the 3 re interviews for client002 do fall in the range >=5 and <=7
  • The client must have at least one service delivered between the re-interviews.- self explanatory just by looking at data

@Asileon What i wanna you to clarify is-  "Part of an id that qualifies as in the case of 004"

1. If i were to pick only those id's that qualify on the basis of satisfying all the conditions, the "part of client004" concerns me, that is it means like pick the obs in all ids that qualify and leave the rest?????? or is it pick only those id's that satisfy "all" conditions

2.Are you using SAS 9.4, this is important to me for me to attempt using more short cuts 

3. I am doing you a favor by creating data have as follows:

 

data have;
input ClientID Category & $20. Date : mmddyy10.;
format date mmddyy10.;
datalines;
1	initial interview	2/1/2016
1	service delivered	7/1/2016
1	re-interview	8/1/2016
1	service delivered	10/1/2016
1	re-interview	1/1/2017
1	service delivered	3/1/2017
1	re-interview	8/1/2017
2	initial interview	2/1/2016
2	service delivered	3/1/2016
2	re-interview	8/1/2016
2	service delivered	10/1/2016
2	re-interview	1/1/2017
2	service delivered	3/1/2017
2	re-interview	8/1/2017
3	initial interview	2/1/2016
3	re-interview	8/1/2016
3	service delivered	10/1/2016
3	re-interview	1/1/2017
3	service delivered	3/1/2017
3	re-interview	8/1/2017
4	initial interview	2/1/2016
4	service delivered	3/1/2016
4	re-interview	8/1/2016
4	service delivered	10/1/2016
4	re-interview	1/1/2017
4	service delivered	3/1/2017
4	re-interview	8/1/2017
4	service delivered	9/1/2017
4	re-interview	12/1/2017
;

PS the bold is just make it look distinct and neat and not to shout 🙂

 

Asileon
Obsidian | Level 7

Hi @novinosrin,

 

Thanks for the feedback!

 

Answering your questions:

@Asileon What i wanna you to clarify is-  "Part of an id that qualifies as in the case of 004"

 

1. If i were to pick only those id's that qualify on the basis of satisfying all the conditions, the "part of client004" concerns me, that is it means like pick the obs in all ids that qualify and leave the rest?????? or is it pick only those id's that satisfy "all" conditions

Answer: I want to pick only those id's that satisfy "all" conditions, be able to create a dataset with only those id's, and have the qualifying observations in that dataset. In the example I posted, the dataset will have:

- client 002, the initial interview, the service(s) within the 6 months after initial interview, the 3 re-interview 5-7 months from the previous, and the service(s) delivered in between the re-interviews.

- Client 004, and the same as above.

 

2.Are you using SAS 9.4, this is important to me for me to attempt using more short cuts 

Answer: Yes, I am using 9.4

 

3. I am doing you a favor by creating data have as follows:

Answer: Thank you. This is my first posting ever, and I was so focus in creating a good and well explained example of my issue that I forgot. I will try this does not happen in the future.

Asileon
Obsidian | Level 7

Hi @kiranv_

 

Client 002 qualify because this client had a service delivered on 03/01/2016 which is within 6 months of the initial interview (one of the criteria). The client will continue to have services even after 6 months of the initial interview, and that is why you see another service 6 months past the initial interview.

 

002

initial interview

02/01/2016

002

service delivered

03/01/2016

002

re-interview

08/01/2016

002

service delivered

10/01/2016

 

 

Thanks

kiranv_
Rhodochrosite | Level 12

for the first criteria 001 also qualifies right because 07/01/2016 also within 6 months and this where I got confused. I am sorry, if this does not make sense.

 

001

initial interview

02/01/2016

001

service delivered

07/01/2016

001

re-interview

08/01/2016

001

service delivered

10/01/2016

Asileon
Obsidian | Level 7

Thank you for the correction, you are right! I apologize, I was focused in including a possible qualify/ no-qualify situation for each client, and I did not do a good job with client 001. Please find below a fix for this, where client 001 DO NOT QUALIFY based on what I said before:

 

data have;
input ClientID Category & $20. Date : mmddyy10.;
format date mmddyy10.;
datalines;1	initial interview	2/1/2016
1	service delivered	8/25/2016
1	re-interview	8/26/2016
1	service delivered	10/1/2016
1	re-interview	1/1/2017
1	service delivered	3/1/2017
1	re-interview	8/1/2017
2	initial interview	2/1/2016
2	service delivered	3/1/2016
2	re-interview	8/1/2016
2	service delivered	10/1/2016
2	re-interview	1/1/2017
2	service delivered	3/1/2017
2	re-interview	8/1/2017
3	initial interview	2/1/2016
3	re-interview	8/1/2016
3	service delivered	10/1/2016
3	re-interview	1/1/2017
3	service delivered	3/1/2017
3	re-interview	8/1/2017
4	initial interview	2/1/2016
4	service delivered	3/1/2016
4	re-interview	8/1/2016
4	service delivered	10/1/2016
4	re-interview	1/1/2017
4	service delivered	3/1/2017
4	re-interview	8/1/2017
4	service delivered	9/1/2017
4	re-interview	12/1/2017;

 

Thanks again! 

PGStats
Opal | Level 21

In your example, all dates are on the first day of the month. Do the dates represent months?

PG
Asileon
Obsidian | Level 7

Hi @PGStats,

 

The original dataset has millions of observations, and in this example I used imaginary data. This dates represent month/day/year -  I used 01 as the day for all of the dates because I tried to make it visually easy for the reader, I was attempting that the reader was able to see the time difference between dates without having to calculate too much.

 

Thanks

novinosrin
Tourmaline | Level 20

Good morning from Chicago, Welcome to the SAS forum and thank you for the steadfast communication in clarifying. I'll be right back hopefully with the code or perhaps for some more clarifications. 

Asileon
Obsidian | Level 7
Good morning!

Thanks a lot @novinosrin, I really appreciate it!

art297
Opal | Level 21

I question your logic regarding client 4. They had already met the criteria by the time they had their 3rd re-interview.

 

If that is correct, then the following code (I think) captures all of your criteria:

data have;
  infile datalines dlm=',';
  informat ClientID 8.;
  informat category $20.;
  informat date mmddyy10.;
  input ClientID Category Date;
  format date mmddyy10.;
  datalines;
1,initial interview,2/1/2016
1,service delivered,8/25/2016
1,re-interview,8/26/2016
1,service delivered,10/1/2016
1,re-interview,1/1/2017
1,service delivered,3/1/2017
1,re-interview,8/1/2017
2,initial interview,2/1/2016
2,service delivered,3/1/2016
2,re-interview,8/1/2016
2,service delivered,10/1/2016
2,re-interview,1/1/2017
2,service delivered,3/1/2017
2,re-interview,8/1/2017
3,initial interview,2/1/2016
3,re-interview,8/1/2016
3,service delivered,10/1/2016
3,re-interview,1/1/2017
3,service delivered,3/1/2017
3,re-interview,8/1/2017
4,initial interview,2/1/2016
4,service delivered,3/1/2016
4,re-interview,8/1/2016
4,service delivered,10/1/2016
4,re-interview,1/1/2017
4,service delivered,3/1/2017
4,re-interview,8/1/2017
4,service delivered,9/1/2017
4,re-interview,12/1/2017
;
/*
 Criterion1: initial interview (criterion_int)
 Criterion2: service within 6 months of initial interview
 Criterion3: 3 re-interviews within 5-7 months of each other,
              with each of the first two re-interviews
              followed by a service
*/
data want;
  set have;
  by ClientID;
  array reint(10);
  retain criterion1 criterion2 reint:;
  last_category=lag(category);
  last_date=lag(date);
  if first.ClientID then do;
    call missing(criterion1);
    call missing(criterion2);
    call missing(last_category);
    call missing(last_date);
    counter1=0;
    counter2=0;
  end;
  if category eq 'initial interview' then criterion1=date;
  else if not missing(criterion1) then do;
    if missing(criterion2) and category eq 'service delivered'
     and date le intnx('month',criterion1,6,'sameday')
     then criterion2=date;
    else if last_category eq 're-interview' and 
        category eq 'service delivered' then do;
         counter1+1;
         reint(counter1)=last_date;
    end;
    else if category eq 're-interview' and counter1 ge 2 then do;
       counter1+1;
       reint(counter1)=date;
    end;
  end;
  if last.ClientID and criterion1 and criterion2 and counter1 ge 3 then do;
    counter2=counter1;
    do i=1 to counter1-1;
      if not(intnx('month',reint(i),5,'sameday')<=
       reint(i+1)<=intnx('month',reint(i),7,'sameday')) then counter2=counter2-1;
    end;
  end;
  if counter2 ge 3;
run;

Art, CEO, AnalystFinder.com

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 71 replies
  • 1743 views
  • 12 likes
  • 8 in conversation