BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Riri_V
Fluorite | Level 6

Hello everyone,

 

I've done something similar in the past, but now can't really remember how I did that.

Hope someone can help me with this.

 

Here's a sample of historical data and final result i'm aiming for

 

Sample 

customer_idreporting_date rating_18_20 (flag)
111111Apr-191
111111May-191
111111Jun-190
111111Jul-190
111111Aug-190
111111Sep-190
111111Oct-191
111111Nov-191
111111Dec-191
111111Jan-200
222222Apr-190
222222May-190
222222Jun-190
222222Jul-190
222222Aug-191
222222Sep-191
222222Oct-191
222222Nov-191
222222Dec-190
222222Jan-200
333333Apr-190
333333May-190
333333Jun-190
333333Jul-190
333333Aug-190
333333Sep-191
333333Oct-191
333333Nov-191
333333Dec-191
333333Jan-201
444444Apr-190
444444May-190
444444Jun-190
444444Jul-190
444444Aug-190
444444Sep-190
444444Oct-190
444444Nov-190
444444Dec-190
444444Jan-200
555555Apr-191
555555May-191
555555Jun-191
555555Jul-191
555555Aug-191
555555Sep-191
555555Oct-191
555555Nov-191
555555Dec-191
555555Jan-201

 

Needed table:

Data aggregated on customer level where reporting_date_rating is a date when customer got rating_18_20 =1 for the last time in history and period_rating_18_20 is amount of months the customer has been possessing the flag for the last period in history.

 

customer_idreporting_date_ratingperiod_rating_18_20
111111Oct-193
222222Aug-194
333333Sep-195
444444 0
555555Apr-1910

 

Here's a code that doesn't work for customers who didn't have rating_18_20 =1 (for example customer number 444444) and returns period_rating_18_20 = 10 instead of 0.

Could anyone please look into it and tell me where the logic is incorrect. 

Thanks.

 

/*18-20*/
proc sort data=WORK.SAMPLE (keep=customer_id reporting_date rating_18_20)
out=rating_flags_sort nodup;
by customer_id rating_18_20 ;
run;
data rating;
set rating_flags_sort;
format for_lag_dt lag_dt date9.;
by customer_id rating_18_20;
retain num_of_period for_lag_dt cnt_months;
if first.rating_18_20 then do;
num_of_period = 1;
for_lag_dt = reporting_date;
cnt_months= 1;
end;
else do;
lag_dt = for_lag_dt;
for_lag_dt = reporting_date;
a=intck('month', lag_dt, reporting_date);
if a>1 then do; num_of_period+1;
cnt_months =1;end;
else cnt_months + 1;
end;
/*drop lag_dt for_lag_dt;*/
run;

/*doesn't work for customers with 0 flags - returns max amount of months in history*/
data rating_2;
set rating;
by customer_id;
if last.customer_id;
keep customer_id rating_18_20 cnt_months;
run;

/*date of rating*/
proc sort data=rating out=rating_sort_2;
by customer_id descending rating_18_20 descending num_of_period reporting_date;
run;

data rating_3;
set rating_sort_2;
by customer_id descending rating_18_20 descending num_of_period reporting_date;
if first.customer_id and first.num_of_period;
keep customer_id reporting_date;
run;
data rating_total;
merge rating_2 rating_3;
by customer_id;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
yabwon
Amethyst | Level 16

Hi,

 

How about group by processing?

 

data have;
input customer_id : $	reporting_date monyy6.	rating_18_20;
format reporting_date yymmdd10.;
cards4;
111111 Apr-19 1
111111 May-19 1
111111 Jun-19 0
111111 Jul-19 0
111111 Aug-19 0
111111 Sep-19 0
111111 Oct-19 1
111111 Nov-19 1
111111 Dec-19 1
111111 Jan-20 0
222222 Apr-19 0
222222 May-19 0
222222 Jun-19 0
222222 Jul-19 0
222222 Aug-19 1
222222 Sep-19 1
222222 Oct-19 1
222222 Nov-19 1
222222 Dec-19 0
222222 Jan-20 0
333333 Apr-19 0
333333 May-19 0
333333 Jun-19 0
333333 Jul-19 0
333333 Aug-19 0
333333 Sep-19 1
333333 Oct-19 1
333333 Nov-19 1
333333 Dec-19 1
333333 Jan-20 1
444444 Apr-19 0
444444 May-19 0
444444 Jun-19 0
444444 Jul-19 0
444444 Aug-19 0
444444 Sep-19 0
444444 Oct-19 0
444444 Nov-19 0
444444 Dec-19 0
444444 Jan-20 0
555555 Apr-19 1
555555 May-19 1
555555 Jun-19 1
555555 Jul-19 1
555555 Aug-19 1
555555 Sep-19 1
555555 Oct-19 1
555555 Nov-19 1
555555 Dec-19 1
555555 Jan-20 1
;;;;
run;

proc sort data = have;
  by customer_id reporting_date;
run;

data want(rename=(rd=reporting_date));
  set have;
  by customer_id rating_18_20 notsorted;
  if first.customer_id then 
    do;
      rd = .; 
      cnt = 0;
    end;

  if first.rating_18_20 = 1 = rating_18_20 then 
    do;      
      rd=reporting_date;
      cnt = 0;
    end;
  cnt+rating_18_20;

  if last.customer_id then output;
  retain rd;
  drop reporting_date rating_18_20; 
  format rd monyy6.;
run;
proc print;
run;

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



View solution in original post

7 REPLIES 7
novinosrin
Tourmaline | Level 20

Hi @Riri_V  Assuming I understand your requirement, the following should work

 




data have;
input customer_id	reporting_date $	rating_18_20;
cards;
111111	Apr-19	1
111111	May-19	1
111111	Jun-19	0
111111	Jul-19	0
111111	Aug-19	0
111111	Sep-19	0
111111	Oct-19	1
111111	Nov-19	1
111111	Dec-19	1
111111	Jan-20	0
222222	Apr-19	0
222222	May-19	0
222222	Jun-19	0
222222	Jul-19	0
222222	Aug-19	1
222222	Sep-19	1
222222	Oct-19	1
222222	Nov-19	1
222222	Dec-19	0
222222	Jan-20	0
333333	Apr-19	0
333333	May-19	0
333333	Jun-19	0
333333	Jul-19	0
333333	Aug-19	0
333333	Sep-19	1
333333	Oct-19	1
333333	Nov-19	1
333333	Dec-19	1
333333	Jan-20	1
444444	Apr-19	0
444444	May-19	0
444444	Jun-19	0
444444	Jul-19	0
444444	Aug-19	0
444444	Sep-19	0
444444	Oct-19	0
444444	Nov-19	0
444444	Dec-19	0
444444	Jan-20	0
555555	Apr-19	1
555555	May-19	1
555555	Jun-19	1
555555	Jul-19	1
555555	Aug-19	1
555555	Sep-19	1
555555	Oct-19	1
555555	Nov-19	1
555555	Dec-19	1
555555	Jan-20	1
;

data want;
 do until(last.customer_id);
  do _n_=1 by 1 until(last.rating_18_20);
   set have ;
   by customer_id rating_18_20 notsorted;
   if rating_18_20 then do;
    if _n_=1 then reporting_date_rating=reporting_date;
    period_rating_18_20=_n_;
   end;
  end;
 end;
 period_rating_18_20=max(period_rating_18_20,0);
 keep customer_id	reporting_date_rating	period_rating_18_20; 
run;

customer_id reporting_date_rating period_rating_18_20
111111 Oct-19 3
222222 Aug-19 4
333333 Sep-19 5
444444   0
555555 Apr-19 10

 

 

Riri_V
Fluorite | Level 6

Thanks, works perfectly on the sample! But on the whole table discounted flags from past years. 

ed_sas_member
Meteorite | Level 14

Hi @Riri_V 

 

here is an attempt to achieve this:

data have;
	infile datalines dlm="09"x;
	input customer_id	reporting_date :monyy6.	rating_18_20;
	format reporting_date date9.;
	datalines;
111111	Apr-19	1
111111	May-19	1
111111	Jun-19	0
111111	Jul-19	0
111111	Aug-19	0
111111	Sep-19	0
111111	Oct-19	1
111111	Nov-19	1
111111	Dec-19	1
111111	Jan-20	0
222222	Apr-19	0
222222	May-19	0
222222	Jun-19	0
222222	Jul-19	0
222222	Aug-19	1
222222	Sep-19	1
222222	Oct-19	1
222222	Nov-19	1
222222	Dec-19	0
222222	Jan-20	0
333333	Apr-19	0
333333	May-19	0
333333	Jun-19	0
333333	Jul-19	0
333333	Aug-19	0
333333	Sep-19	1
333333	Oct-19	1
333333	Nov-19	1
333333	Dec-19	1
333333	Jan-20	1
444444	Apr-19	0
444444	May-19	0
444444	Jun-19	0
444444	Jul-19	0
444444	Aug-19	0
444444	Sep-19	0
444444	Oct-19	0
444444	Nov-19	0
444444	Dec-19	0
444444	Jan-20	0
555555	Apr-19	1
555555	May-19	1
555555	Jun-19	1
555555	Jul-19	1
555555	Aug-19	1
555555	Sep-19	1
555555	Oct-19	1
555555	Nov-19	1
555555	Dec-19	1
555555	Jan-20	1
;
run;

/* Customers with at least one flag = 1*/

proc sort data=have out=have_sorted_d;
	by customer_id descending reporting_date;
run;

data have2;
	set have_sorted_d;
	by customer_id rating_18_20 notsorted;
	if first.rating_18_20 then count + 1;
run;

proc sql;
	create table have3 as
	select *, count(count) as period_rating_18_20
	from have2
	where rating_18_20 = 1
	group by customer_id, count
	order by customer_id, count, reporting_date asc;
quit;

data have4;
	set have3;
	by customer_id;
	if first.customer_id then output;
	drop count rating_18_20;
run;

/* Customers with no flag = 1*/

proc sql;
	create table have_noflag as
	select distinct customer_id
	from have
	group by customer_id
	having sum(rating_18_20) = 0;
quit;

/* Final table */

data want;
	merge have4 have_noflag;
	by customer_id;
	if period_rating_18_20=. then period_rating_18_20=0;
run;
yabwon
Amethyst | Level 16

Hi,

 

How about group by processing?

 

data have;
input customer_id : $	reporting_date monyy6.	rating_18_20;
format reporting_date yymmdd10.;
cards4;
111111 Apr-19 1
111111 May-19 1
111111 Jun-19 0
111111 Jul-19 0
111111 Aug-19 0
111111 Sep-19 0
111111 Oct-19 1
111111 Nov-19 1
111111 Dec-19 1
111111 Jan-20 0
222222 Apr-19 0
222222 May-19 0
222222 Jun-19 0
222222 Jul-19 0
222222 Aug-19 1
222222 Sep-19 1
222222 Oct-19 1
222222 Nov-19 1
222222 Dec-19 0
222222 Jan-20 0
333333 Apr-19 0
333333 May-19 0
333333 Jun-19 0
333333 Jul-19 0
333333 Aug-19 0
333333 Sep-19 1
333333 Oct-19 1
333333 Nov-19 1
333333 Dec-19 1
333333 Jan-20 1
444444 Apr-19 0
444444 May-19 0
444444 Jun-19 0
444444 Jul-19 0
444444 Aug-19 0
444444 Sep-19 0
444444 Oct-19 0
444444 Nov-19 0
444444 Dec-19 0
444444 Jan-20 0
555555 Apr-19 1
555555 May-19 1
555555 Jun-19 1
555555 Jul-19 1
555555 Aug-19 1
555555 Sep-19 1
555555 Oct-19 1
555555 Nov-19 1
555555 Dec-19 1
555555 Jan-20 1
;;;;
run;

proc sort data = have;
  by customer_id reporting_date;
run;

data want(rename=(rd=reporting_date));
  set have;
  by customer_id rating_18_20 notsorted;
  if first.customer_id then 
    do;
      rd = .; 
      cnt = 0;
    end;

  if first.rating_18_20 = 1 = rating_18_20 then 
    do;      
      rd=reporting_date;
      cnt = 0;
    end;
  cnt+rating_18_20;

  if last.customer_id then output;
  retain rd;
  drop reporting_date rating_18_20; 
  format rd monyy6.;
run;
proc print;
run;

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Riri_V
Fluorite | Level 6

@yabwon  Thaaank you so much! exactly what was needed! 

yabwon
Amethyst | Level 16

Hi @Riri_V ,

 

glad I could help 🙂

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



PeterClemmensen
Tourmaline | Level 20

An alternative..

 

data have;
input customer_id $ reporting_date $ rating_18_20;
datalines;
111111 Apr-19 1
111111 May-19 1
111111 Jun-19 0
111111 Jul-19 0
111111 Aug-19 0
111111 Sep-19 0
111111 Oct-19 1
111111 Nov-19 1
111111 Dec-19 1
111111 Jan-20 0
222222 Apr-19 0
222222 May-19 0
222222 Jun-19 0
222222 Jul-19 0
222222 Aug-19 1
222222 Sep-19 1
222222 Oct-19 1
222222 Nov-19 1
222222 Dec-19 0
222222 Jan-20 0
333333 Apr-19 0
333333 May-19 0
333333 Jun-19 0
333333 Jul-19 0
333333 Aug-19 0
333333 Sep-19 1
333333 Oct-19 1
333333 Nov-19 1
333333 Dec-19 1
333333 Jan-20 1
444444 Apr-19 0
444444 May-19 0
444444 Jun-19 0
444444 Jul-19 0
444444 Aug-19 0
444444 Sep-19 0
444444 Oct-19 0
444444 Nov-19 0
444444 Dec-19 0
444444 Jan-20 0
555555 Apr-19 1
555555 May-19 1
555555 Jun-19 1
555555 Jul-19 1
555555 Aug-19 1
555555 Sep-19 1
555555 Oct-19 1
555555 Nov-19 1
555555 Dec-19 1
555555 Jan-20 1
;

data want (keep=customer_id r period_rating_18_20);
    if _N_=1 then do;                            
        declare hash h(ordered : 'D', multidata : 'Y');
        h.definekey('customer_id');
        h.definedata('customer_id', 'period_rating_18_20', 'r');
        h.definedone();
        declare hiter hi ('h');
    end;
 
    do period_rating_18_20 = 1 by 1 until (last.customer_id | last.rating_18_20);             
        set have end=lr;
        by customer_id rating_18_20 notsorted;
        if first.customer_id then h.add(key: customer_id, data: customer_id, data: 0, data: " ");
        if first.rating_18_20 then r = reporting_date;
    end;

    if rating_18_20=1 then h.add();

    if lr;
    do while (hi.last()=0);
        output;
        rc=hi.next();
        h.remove();
    end;
run;