SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Counting the number of observations between rows (same ID) and output the ID that meet the criteria?

Accepted Solution Solved
Reply
Contributor
Posts: 32
Accepted Solution

Counting the number of observations between rows (same ID) and output the ID that meet the criteria?

Hello,

 

Thanks a lot in advance! Any suggestion would be greatly appreciated.

 

I have a dataset with multiple observations per Client ID and have to select just the clients that meet the following criteria:

  • The client must have an initial interview.
  • The client must have at least one service (cleaning house, delivering groceries, etc.) delivered within 6 months of the initial interview.
  • The client must have at least 3 re-interviews 5-7 months apart from each other.
  • The client must have at least one service delivered between the re-interviews.
  • Get a final dataset with the qualifying Client ID and only the qualifying observations.

 

Client ID

Category

Date

001

initial interview

02/01/2016

001

service delivered

07/01/2016

001

re-interview

08/01/2016

001

service delivered

10/01/2016

001

re-interview

01/01/2017

001

service delivered

03/01/2017

001

re-interview

08/01/2017

002

initial interview

02/01/2016

002

service delivered

03/01/2016

002

re-interview

08/01/2016

002

service delivered

10/01/2016

002

re-interview

01/01/2017

002

service delivered

03/01/2017

002

re-interview

08/01/2017

003

initial interview

02/01/2016

003

re-interview

08/01/2016

003

service delivered

10/01/2016

003

re-interview

01/01/2017

003

service delivered

03/01/2017

003

re-interview

08/01/2017

004

initial interview

02/01/2016

004

service delivered

03/01/2016

004

re-interview

08/01/2016

004

service delivered

10/01/2016

004

re-interview

01/01/2017

004

service delivered

03/01/2017

004

re-interview

08/01/2017

004

service delivered

09/01/2017

004

re-interview

12/01/2017

The only client meeting the criteria is client 002 and part of client 004. Because:

  • Client 001 had a service after the initial interview that was delivered after 6 months of the initial interview.
  • Client 003 had an initial interview and then a re-interview with no service(s) in between
  • Client 004 had a re-interview (12/01/2017) that was only 4 months from the previous re-interview.

 

The output should be:

Client ID

Category

Date

002

initial interview

02/01/2016

002

service delivered

03/01/2016

002

re-interview

08/01/2016

002

service delivered

10/01/2016

002

re-interview

01/01/2017

002

service delivered

03/01/2017

002

re-interview

08/01/2017

004

initial interview

02/01/2016

004

service delivered

03/01/2016

004

re-interview

08/01/2016

004

service delivered

10/01/2016

004

re-interview

01/01/2017

004

service delivered

03/01/2017

004

re-interview

08/01/2017

004

service delivered

09/01/2017

 

 


Accepted Solutions
Solution
‎01-24-2018 06:11 PM
PROC Star
Posts: 8,150

Re: Counting the number of observations between rows (same ID) and output the ID that meet the crite

In the code, below, I incorporated the same thing as @PGStats suggested for his code:

data client_engage_3 (keep=ClientID category date);
  set client_engage_2;
  by ClientID;
  array reint(999) _temporary_;
  retain criterion1 criterion2 reint:;
  last_category=lag(category);
  last_date=lag(date);
  if first.ClientID then do;
    call missing(criterion1);
    call missing(criterion2);
    call missing(last_category);
    call missing(last_date);
    counter1=0;
    counter2=0;
  end;
  if lowcase(first(left(category))) eq 'i' then criterion1=date;
  else if not missing(criterion1) then do;
    if missing(criterion2) and lowcase(first(left(category))) eq 's'
     and date le intnx('month',criterion1,6,'sameday')
     then criterion2=date;
    else if lowcase(first(left(last_category))) eq 'r' and 
        lowcase(first(left(category))) eq 's' then do;
         counter1+1;
         reint(counter1)=last_date;
    end;
    else if lowcase(first(left(category))) eq 'r' and counter1 ge 2 then do;
       counter1+1;
       reint(counter1)=date;
    end;
    else if lowcase(first(left(category))) eq 'r' and lowcase(first(left(last_category))) eq 'r' then do;
      call missing(criterion1);
      call missing(criterion2);
      counter1=0;
    end;
  end;
  if last.ClientID and criterion1 and criterion2 and counter1 ge 3 then do;
    counter2=counter1;
    do i=1 to counter1-1;
      if i eq 1 and not(intnx('month',criterion1,5,'sameday')<=
       reint(i)<=intnx('month',criterion1,7,'sameday')) then counter2=0;
      if not(intnx('month',reint(i),5,'sameday')<=
       reint(i+1)<=intnx('month',reint(i),7,'sameday')) then counter2=counter2-1;
    end;
  end;
  if counter2 ge 3;
run;

Let me/us know if that works correctly with your real data.

 

Art, CEO, AnalystFinder.com

 

View solution in original post


All Replies
Trusted Advisor
Posts: 1,312

Re: Counting the number of observations between rows (same ID) and output the ID that meet the crite

How about putting in a little more effort and generating SAS code for data set generation.  And possibly, in addition to text describing record types, you might also include code numbers or letters.

Contributor
Posts: 32

Re: Counting the number of observations between rows (same ID) and output the ID that meet the crite

Hi @mkeintz

 

Thanks for your feedback. I have been able to create a flag for the re-interviews that are 5-7 months from the previous. The main issue that I am having is using SAS to output just the client IDs that have a re-interview following by another with at least one services in between.

 

Thanks

PROC Star
Posts: 504

Re: Counting the number of observations between rows (same ID) and output the ID that meet the crite

[ Edited ]

 

your data and questions are confusing to me

 

you said 

The only client meeting the criteria is client 002 and part of client 004. Because:

  • Client 001 had a service after the initial interview that was delivered after 6 months of the initial interview.

but 002 also has same issue

 

002

initial interview

02/01/2016

002

service delivered

03/01/2016

002

re-interview

08/01/2016

002

service delivered

10/01/2016

PROC Star
Posts: 1,604

Re: Counting the number of observations between rows (same ID) and output the ID that meet the crite

[ Edited ]

Hi @kiranv_    OP's question is fairly clear as far as client 002 is concerned.I had a look at this question when i was about to leave my college lab and it's pretty straight forward except the part qualification(eg client 004). However since my own non sas assignments in school that's due in for next week is in a mess, I am afraid I didn't bother though excited.

Anyway, to clarify client 002 

Client 002 had the first aka initial interview on 02/01/2016 

  • The client must have at least one service (cleaning house, delivering groceries, etc.) delivered within 6 months of the initial interview-- Client 002 had the first serivice post the initial interview on 03/01/2016 which is the subsequent month and that falls within th condition stated above
  • The client must have at least 3 re-interviews 5-7 months apart from each other---intck aka intevals between the 3 re interviews for client002 do fall in the range >=5 and <=7
  • The client must have at least one service delivered between the re-interviews.- self explanatory just by looking at data

@Asileon What i wanna you to clarify is-  "Part of an id that qualifies as in the case of 004"

1. If i were to pick only those id's that qualify on the basis of satisfying all the conditions, the "part of client004" concerns me, that is it means like pick the obs in all ids that qualify and leave the rest?????? or is it pick only those id's that satisfy "all" conditions

2.Are you using SAS 9.4, this is important to me for me to attempt using more short cuts 

3. I am doing you a favor by creating data have as follows:

 

data have;
input ClientID Category & $20. Date : mmddyy10.;
format date mmddyy10.;
datalines;
1	initial interview	2/1/2016
1	service delivered	7/1/2016
1	re-interview	8/1/2016
1	service delivered	10/1/2016
1	re-interview	1/1/2017
1	service delivered	3/1/2017
1	re-interview	8/1/2017
2	initial interview	2/1/2016
2	service delivered	3/1/2016
2	re-interview	8/1/2016
2	service delivered	10/1/2016
2	re-interview	1/1/2017
2	service delivered	3/1/2017
2	re-interview	8/1/2017
3	initial interview	2/1/2016
3	re-interview	8/1/2016
3	service delivered	10/1/2016
3	re-interview	1/1/2017
3	service delivered	3/1/2017
3	re-interview	8/1/2017
4	initial interview	2/1/2016
4	service delivered	3/1/2016
4	re-interview	8/1/2016
4	service delivered	10/1/2016
4	re-interview	1/1/2017
4	service delivered	3/1/2017
4	re-interview	8/1/2017
4	service delivered	9/1/2017
4	re-interview	12/1/2017
;

PS the bold is just make it look distinct and neat and not to shout Smiley Happy

 

Contributor
Posts: 32

Re: Counting the number of observations between rows (same ID) and output the ID that meet the crite

Posted in reply to novinosrin

Hi @novinosrin,

 

Thanks for the feedback!

 

Answering your questions:

@Asileon What i wanna you to clarify is-  "Part of an id that qualifies as in the case of 004"

 

1. If i were to pick only those id's that qualify on the basis of satisfying all the conditions, the "part of client004" concerns me, that is it means like pick the obs in all ids that qualify and leave the rest?????? or is it pick only those id's that satisfy "all" conditions

Answer: I want to pick only those id's that satisfy "all" conditions, be able to create a dataset with only those id's, and have the qualifying observations in that dataset. In the example I posted, the dataset will have:

- client 002, the initial interview, the service(s) within the 6 months after initial interview, the 3 re-interview 5-7 months from the previous, and the service(s) delivered in between the re-interviews.

- Client 004, and the same as above.

 

2.Are you using SAS 9.4, this is important to me for me to attempt using more short cuts 

Answer: Yes, I am using 9.4

 

3. I am doing you a favor by creating data have as follows:

Answer: Thank you. This is my first posting ever, and I was so focus in creating a good and well explained example of my issue that I forgot. I will try this does not happen in the future.

Contributor
Posts: 32

Re: Counting the number of observations between rows (same ID) and output the ID that meet the crite

Hi @kiranv_

 

Client 002 qualify because this client had a service delivered on 03/01/2016 which is within 6 months of the initial interview (one of the criteria). The client will continue to have services even after 6 months of the initial interview, and that is why you see another service 6 months past the initial interview.

 

002

initial interview

02/01/2016

002

service delivered

03/01/2016

002

re-interview

08/01/2016

002

service delivered

10/01/2016

 

 

Thanks

PROC Star
Posts: 504

Re: Counting the number of observations between rows (same ID) and output the ID that meet the crite

for the first criteria 001 also qualifies right because 07/01/2016 also within 6 months and this where I got confused. I am sorry, if this does not make sense.

 

001

initial interview

02/01/2016

001

service delivered

07/01/2016

001

re-interview

08/01/2016

001

service delivered

10/01/2016

Contributor
Posts: 32

Re: Counting the number of observations between rows (same ID) and output the ID that meet the crite

Thank you for the correction, you are right! I apologize, I was focused in including a possible qualify/ no-qualify situation for each client, and I did not do a good job with client 001. Please find below a fix for this, where client 001 DO NOT QUALIFY based on what I said before:

 

data have;
input ClientID Category & $20. Date : mmddyy10.;
format date mmddyy10.;
datalines;1	initial interview	2/1/2016
1	service delivered	8/25/2016
1	re-interview	8/26/2016
1	service delivered	10/1/2016
1	re-interview	1/1/2017
1	service delivered	3/1/2017
1	re-interview	8/1/2017
2	initial interview	2/1/2016
2	service delivered	3/1/2016
2	re-interview	8/1/2016
2	service delivered	10/1/2016
2	re-interview	1/1/2017
2	service delivered	3/1/2017
2	re-interview	8/1/2017
3	initial interview	2/1/2016
3	re-interview	8/1/2016
3	service delivered	10/1/2016
3	re-interview	1/1/2017
3	service delivered	3/1/2017
3	re-interview	8/1/2017
4	initial interview	2/1/2016
4	service delivered	3/1/2016
4	re-interview	8/1/2016
4	service delivered	10/1/2016
4	re-interview	1/1/2017
4	service delivered	3/1/2017
4	re-interview	8/1/2017
4	service delivered	9/1/2017
4	re-interview	12/1/2017;

 

Thanks again! 

Contributor
Posts: 32

Re: Counting the number of observations between rows (same ID) and output the ID that meet the crite

@kiranv_ Please see response above, Thanks

Esteemed Advisor
Posts: 5,487

Re: Counting the number of observations between rows (same ID) and output the ID that meet the crite

In your example, all dates are on the first day of the month. Do the dates represent months?

PG
Contributor
Posts: 32

Re: Counting the number of observations between rows (same ID) and output the ID that meet the crite

Hi @PGStats,

 

The original dataset has millions of observations, and in this example I used imaginary data. This dates represent month/day/year -  I used 01 as the day for all of the dates because I tried to make it visually easy for the reader, I was attempting that the reader was able to see the time difference between dates without having to calculate too much.

 

Thanks

PROC Star
Posts: 1,604

Re: Counting the number of observations between rows (same ID) and output the ID that meet the crite

Good morning from Chicago, Welcome to the SAS forum and thank you for the steadfast communication in clarifying. I'll be right back hopefully with the code or perhaps for some more clarifications. 

Contributor
Posts: 32

Re: Counting the number of observations between rows (same ID) and output the ID that meet the crite

Good morning!

Thanks a lot @novinosrin, I really appreciate it!

PROC Star
Posts: 8,150

Re: Counting the number of observations between rows (same ID) and output the ID that meet the crite

I question your logic regarding client 4. They had already met the criteria by the time they had their 3rd re-interview.

 

If that is correct, then the following code (I think) captures all of your criteria:

data have;
  infile datalines dlm=',';
  informat ClientID 8.;
  informat category $20.;
  informat date mmddyy10.;
  input ClientID Category Date;
  format date mmddyy10.;
  datalines;
1,initial interview,2/1/2016
1,service delivered,8/25/2016
1,re-interview,8/26/2016
1,service delivered,10/1/2016
1,re-interview,1/1/2017
1,service delivered,3/1/2017
1,re-interview,8/1/2017
2,initial interview,2/1/2016
2,service delivered,3/1/2016
2,re-interview,8/1/2016
2,service delivered,10/1/2016
2,re-interview,1/1/2017
2,service delivered,3/1/2017
2,re-interview,8/1/2017
3,initial interview,2/1/2016
3,re-interview,8/1/2016
3,service delivered,10/1/2016
3,re-interview,1/1/2017
3,service delivered,3/1/2017
3,re-interview,8/1/2017
4,initial interview,2/1/2016
4,service delivered,3/1/2016
4,re-interview,8/1/2016
4,service delivered,10/1/2016
4,re-interview,1/1/2017
4,service delivered,3/1/2017
4,re-interview,8/1/2017
4,service delivered,9/1/2017
4,re-interview,12/1/2017
;
/*
 Criterion1: initial interview (criterion_int)
 Criterion2: service within 6 months of initial interview
 Criterion3: 3 re-interviews within 5-7 months of each other,
              with each of the first two re-interviews
              followed by a service
*/
data want;
  set have;
  by ClientID;
  array reint(10);
  retain criterion1 criterion2 reint:;
  last_category=lag(category);
  last_date=lag(date);
  if first.ClientID then do;
    call missing(criterion1);
    call missing(criterion2);
    call missing(last_category);
    call missing(last_date);
    counter1=0;
    counter2=0;
  end;
  if category eq 'initial interview' then criterion1=date;
  else if not missing(criterion1) then do;
    if missing(criterion2) and category eq 'service delivered'
     and date le intnx('month',criterion1,6,'sameday')
     then criterion2=date;
    else if last_category eq 're-interview' and 
        category eq 'service delivered' then do;
         counter1+1;
         reint(counter1)=last_date;
    end;
    else if category eq 're-interview' and counter1 ge 2 then do;
       counter1+1;
       reint(counter1)=date;
    end;
  end;
  if last.ClientID and criterion1 and criterion2 and counter1 ge 3 then do;
    counter2=counter1;
    do i=1 to counter1-1;
      if not(intnx('month',reint(i),5,'sameday')<=
       reint(i+1)<=intnx('month',reint(i),7,'sameday')) then counter2=counter2-1;
    end;
  end;
  if counter2 ge 3;
run;

Art, CEO, AnalystFinder.com

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 71 replies
  • 587 views
  • 12 likes
  • 8 in conversation