Solved: Re: Count distinct ID while the scope of data changes based on certain...

Judy_w · Posted 07-05-2022 05:31 PM

Hello everyone,

This problem bothered me a lot, could you have a look and advice on how I can solve this problem? thanks!

I have a dataset which has the structure as follows:

ID is the customer ID that has default;

open_month means = -60 means the bank account is opened before 60 months, the value of this column is from -60 to 0;

MOB is the month on book, the value is from 0-60;

I'm trying to do a vintage analysis, so I need to calculate the accumulated number of default customers by open month and MOB, the structure of the output dataset should be open_month, MOB, count of accumulated distinct customer ID.

For example, for the combination of open_month = -60 and MOB = 10 should count all distinct ID where open_month = -60 and MOB <=10, hope I describe it clearly.

ID	open_month	MOB
5069216	-60	5
5069216	-60	6
5023640	-60	7
5023644	-60	7
5069216	-60	7
5069216	-60	9
5116761	-60	9
5069216	-60	10
5059854	-60	11
5069216	-60	11
5116761	-60	11
5002283	-60	12
5059854	-60	12
5069216	-60	12
5116761	-60	12
5059854	-60	13
5085886	-60	13
5116761	-60	13
5085886	-60	14
5116761	-60	14
5085886	-60	15
5085886	-60	16
5085886	-60	17
5085886	-60	18
5085886	-60	19
5085886	-60	20
5065314	-60	21
5085886	-60	21
5065314	-60	22
5085886	-60	22
5065314	-60	23
5085886	-60	23
5065314	-60	24
5085886	-60	24
5060452	-60	25
5065314	-60	25
5085886	-60	25

Your kindly help is highly appreciated!

Best Regards

Judy

qatman28 · Posted 07-05-2022 06:03 PM

A quick and dirty version would be something like:

data expand;
  set have;
  do mobx = 0 to 60;
    if mob <= mobx then output;
  end;
run;

proc sort data=expand out=unique_combos (keep=id open_month mobx) nodupkey;
  by open_month mobx id;

data want;
  set unique_combos;
  by open_month mobx;
  retain count;
  if first.mobx then count = 0;
  count + 1;
  if last.mobx then output;
run;

Basically this outputs all qualifying records by id/open_month/mobx, then restricts to at most one per ID, then counts the number of qualifying IDs. There are of course many ways to do it but this is a fairly simple and understandable method.

View solution in original post

qatman28 · Posted 07-05-2022 06:03 PM

A quick and dirty version would be something like:

data expand;
  set have;
  do mobx = 0 to 60;
    if mob <= mobx then output;
  end;
run;

proc sort data=expand out=unique_combos (keep=id open_month mobx) nodupkey;
  by open_month mobx id;

data want;
  set unique_combos;
  by open_month mobx;
  retain count;
  if first.mobx then count = 0;
  count + 1;
  if last.mobx then output;
run;

Basically this outputs all qualifying records by id/open_month/mobx, then restricts to at most one per ID, then counts the number of qualifying IDs. There are of course many ways to do it but this is a fairly simple and understandable method.

Judy_w · Posted 07-05-2022 06:26 PM

Thank you so much! You are brilliant!

I just add one step to remove records which mobx > -(open_month).

Thank you again! You made my day!

Count distinct ID while the scope of data changes based on certain variables

Re: Count distinct ID while the scope of data changes based on certain variables

Re: Count distinct ID while the scope of data changes based on certain variables

Re: Count distinct ID while the scope of data changes based on certain variables

Count distinct ID while the scope of data changes based on certain variables

Re: Count distinct ID while the scope of data changes based on certain variables

Re: Count distinct ID while the scope of data changes based on certain variables

Re: Count distinct ID while the scope of data changes based on certain variables

Registration is open

SAS Training: Just a Click Away