Hello everyone,
I've done something similar in the past, but now can't really remember how I did that.
Hope someone can help me with this.
Here's a sample of historical data and final result i'm aiming for
Sample
customer_id | reporting_date | rating_18_20 (flag) |
111111 | Apr-19 | 1 |
111111 | May-19 | 1 |
111111 | Jun-19 | 0 |
111111 | Jul-19 | 0 |
111111 | Aug-19 | 0 |
111111 | Sep-19 | 0 |
111111 | Oct-19 | 1 |
111111 | Nov-19 | 1 |
111111 | Dec-19 | 1 |
111111 | Jan-20 | 0 |
222222 | Apr-19 | 0 |
222222 | May-19 | 0 |
222222 | Jun-19 | 0 |
222222 | Jul-19 | 0 |
222222 | Aug-19 | 1 |
222222 | Sep-19 | 1 |
222222 | Oct-19 | 1 |
222222 | Nov-19 | 1 |
222222 | Dec-19 | 0 |
222222 | Jan-20 | 0 |
333333 | Apr-19 | 0 |
333333 | May-19 | 0 |
333333 | Jun-19 | 0 |
333333 | Jul-19 | 0 |
333333 | Aug-19 | 0 |
333333 | Sep-19 | 1 |
333333 | Oct-19 | 1 |
333333 | Nov-19 | 1 |
333333 | Dec-19 | 1 |
333333 | Jan-20 | 1 |
444444 | Apr-19 | 0 |
444444 | May-19 | 0 |
444444 | Jun-19 | 0 |
444444 | Jul-19 | 0 |
444444 | Aug-19 | 0 |
444444 | Sep-19 | 0 |
444444 | Oct-19 | 0 |
444444 | Nov-19 | 0 |
444444 | Dec-19 | 0 |
444444 | Jan-20 | 0 |
555555 | Apr-19 | 1 |
555555 | May-19 | 1 |
555555 | Jun-19 | 1 |
555555 | Jul-19 | 1 |
555555 | Aug-19 | 1 |
555555 | Sep-19 | 1 |
555555 | Oct-19 | 1 |
555555 | Nov-19 | 1 |
555555 | Dec-19 | 1 |
555555 | Jan-20 | 1 |
Needed table:
Data aggregated on customer level where reporting_date_rating is a date when customer got rating_18_20 =1 for the last time in history and period_rating_18_20 is amount of months the customer has been possessing the flag for the last period in history.
customer_id | reporting_date_rating | period_rating_18_20 |
111111 | Oct-19 | 3 |
222222 | Aug-19 | 4 |
333333 | Sep-19 | 5 |
444444 | 0 | |
555555 | Apr-19 | 10 |
Here's a code that doesn't work for customers who didn't have rating_18_20 =1 (for example customer number 444444) and returns period_rating_18_20 = 10 instead of 0.
Could anyone please look into it and tell me where the logic is incorrect.
Thanks.
/*18-20*/
proc sort data=WORK.SAMPLE (keep=customer_id reporting_date rating_18_20)
out=rating_flags_sort nodup;
by customer_id rating_18_20 ;
run;
data rating;
set rating_flags_sort;
format for_lag_dt lag_dt date9.;
by customer_id rating_18_20;
retain num_of_period for_lag_dt cnt_months;
if first.rating_18_20 then do;
num_of_period = 1;
for_lag_dt = reporting_date;
cnt_months= 1;
end;
else do;
lag_dt = for_lag_dt;
for_lag_dt = reporting_date;
a=intck('month', lag_dt, reporting_date);
if a>1 then do; num_of_period+1;
cnt_months =1;end;
else cnt_months + 1;
end;
/*drop lag_dt for_lag_dt;*/
run;
/*doesn't work for customers with 0 flags - returns max amount of months in history*/
data rating_2;
set rating;
by customer_id;
if last.customer_id;
keep customer_id rating_18_20 cnt_months;
run;
/*date of rating*/
proc sort data=rating out=rating_sort_2;
by customer_id descending rating_18_20 descending num_of_period reporting_date;
run;
data rating_3;
set rating_sort_2;
by customer_id descending rating_18_20 descending num_of_period reporting_date;
if first.customer_id and first.num_of_period;
keep customer_id reporting_date;
run;
data rating_total;
merge rating_2 rating_3;
by customer_id;
run;
Hi,
How about group by processing?
data have;
input customer_id : $ reporting_date monyy6. rating_18_20;
format reporting_date yymmdd10.;
cards4;
111111 Apr-19 1
111111 May-19 1
111111 Jun-19 0
111111 Jul-19 0
111111 Aug-19 0
111111 Sep-19 0
111111 Oct-19 1
111111 Nov-19 1
111111 Dec-19 1
111111 Jan-20 0
222222 Apr-19 0
222222 May-19 0
222222 Jun-19 0
222222 Jul-19 0
222222 Aug-19 1
222222 Sep-19 1
222222 Oct-19 1
222222 Nov-19 1
222222 Dec-19 0
222222 Jan-20 0
333333 Apr-19 0
333333 May-19 0
333333 Jun-19 0
333333 Jul-19 0
333333 Aug-19 0
333333 Sep-19 1
333333 Oct-19 1
333333 Nov-19 1
333333 Dec-19 1
333333 Jan-20 1
444444 Apr-19 0
444444 May-19 0
444444 Jun-19 0
444444 Jul-19 0
444444 Aug-19 0
444444 Sep-19 0
444444 Oct-19 0
444444 Nov-19 0
444444 Dec-19 0
444444 Jan-20 0
555555 Apr-19 1
555555 May-19 1
555555 Jun-19 1
555555 Jul-19 1
555555 Aug-19 1
555555 Sep-19 1
555555 Oct-19 1
555555 Nov-19 1
555555 Dec-19 1
555555 Jan-20 1
;;;;
run;
proc sort data = have;
by customer_id reporting_date;
run;
data want(rename=(rd=reporting_date));
set have;
by customer_id rating_18_20 notsorted;
if first.customer_id then
do;
rd = .;
cnt = 0;
end;
if first.rating_18_20 = 1 = rating_18_20 then
do;
rd=reporting_date;
cnt = 0;
end;
cnt+rating_18_20;
if last.customer_id then output;
retain rd;
drop reporting_date rating_18_20;
format rd monyy6.;
run;
proc print;
run;
All the best
Bart
Hi @Riri_V Assuming I understand your requirement, the following should work
data have;
input customer_id reporting_date $ rating_18_20;
cards;
111111 Apr-19 1
111111 May-19 1
111111 Jun-19 0
111111 Jul-19 0
111111 Aug-19 0
111111 Sep-19 0
111111 Oct-19 1
111111 Nov-19 1
111111 Dec-19 1
111111 Jan-20 0
222222 Apr-19 0
222222 May-19 0
222222 Jun-19 0
222222 Jul-19 0
222222 Aug-19 1
222222 Sep-19 1
222222 Oct-19 1
222222 Nov-19 1
222222 Dec-19 0
222222 Jan-20 0
333333 Apr-19 0
333333 May-19 0
333333 Jun-19 0
333333 Jul-19 0
333333 Aug-19 0
333333 Sep-19 1
333333 Oct-19 1
333333 Nov-19 1
333333 Dec-19 1
333333 Jan-20 1
444444 Apr-19 0
444444 May-19 0
444444 Jun-19 0
444444 Jul-19 0
444444 Aug-19 0
444444 Sep-19 0
444444 Oct-19 0
444444 Nov-19 0
444444 Dec-19 0
444444 Jan-20 0
555555 Apr-19 1
555555 May-19 1
555555 Jun-19 1
555555 Jul-19 1
555555 Aug-19 1
555555 Sep-19 1
555555 Oct-19 1
555555 Nov-19 1
555555 Dec-19 1
555555 Jan-20 1
;
data want;
do until(last.customer_id);
do _n_=1 by 1 until(last.rating_18_20);
set have ;
by customer_id rating_18_20 notsorted;
if rating_18_20 then do;
if _n_=1 then reporting_date_rating=reporting_date;
period_rating_18_20=_n_;
end;
end;
end;
period_rating_18_20=max(period_rating_18_20,0);
keep customer_id reporting_date_rating period_rating_18_20;
run;
customer_id | reporting_date_rating | period_rating_18_20 |
---|---|---|
111111 | Oct-19 | 3 |
222222 | Aug-19 | 4 |
333333 | Sep-19 | 5 |
444444 | 0 | |
555555 | Apr-19 | 10 |
Thanks, works perfectly on the sample! But on the whole table discounted flags from past years.
Hi @Riri_V
here is an attempt to achieve this:
data have;
infile datalines dlm="09"x;
input customer_id reporting_date :monyy6. rating_18_20;
format reporting_date date9.;
datalines;
111111 Apr-19 1
111111 May-19 1
111111 Jun-19 0
111111 Jul-19 0
111111 Aug-19 0
111111 Sep-19 0
111111 Oct-19 1
111111 Nov-19 1
111111 Dec-19 1
111111 Jan-20 0
222222 Apr-19 0
222222 May-19 0
222222 Jun-19 0
222222 Jul-19 0
222222 Aug-19 1
222222 Sep-19 1
222222 Oct-19 1
222222 Nov-19 1
222222 Dec-19 0
222222 Jan-20 0
333333 Apr-19 0
333333 May-19 0
333333 Jun-19 0
333333 Jul-19 0
333333 Aug-19 0
333333 Sep-19 1
333333 Oct-19 1
333333 Nov-19 1
333333 Dec-19 1
333333 Jan-20 1
444444 Apr-19 0
444444 May-19 0
444444 Jun-19 0
444444 Jul-19 0
444444 Aug-19 0
444444 Sep-19 0
444444 Oct-19 0
444444 Nov-19 0
444444 Dec-19 0
444444 Jan-20 0
555555 Apr-19 1
555555 May-19 1
555555 Jun-19 1
555555 Jul-19 1
555555 Aug-19 1
555555 Sep-19 1
555555 Oct-19 1
555555 Nov-19 1
555555 Dec-19 1
555555 Jan-20 1
;
run;
/* Customers with at least one flag = 1*/
proc sort data=have out=have_sorted_d;
by customer_id descending reporting_date;
run;
data have2;
set have_sorted_d;
by customer_id rating_18_20 notsorted;
if first.rating_18_20 then count + 1;
run;
proc sql;
create table have3 as
select *, count(count) as period_rating_18_20
from have2
where rating_18_20 = 1
group by customer_id, count
order by customer_id, count, reporting_date asc;
quit;
data have4;
set have3;
by customer_id;
if first.customer_id then output;
drop count rating_18_20;
run;
/* Customers with no flag = 1*/
proc sql;
create table have_noflag as
select distinct customer_id
from have
group by customer_id
having sum(rating_18_20) = 0;
quit;
/* Final table */
data want;
merge have4 have_noflag;
by customer_id;
if period_rating_18_20=. then period_rating_18_20=0;
run;
Hi,
How about group by processing?
data have;
input customer_id : $ reporting_date monyy6. rating_18_20;
format reporting_date yymmdd10.;
cards4;
111111 Apr-19 1
111111 May-19 1
111111 Jun-19 0
111111 Jul-19 0
111111 Aug-19 0
111111 Sep-19 0
111111 Oct-19 1
111111 Nov-19 1
111111 Dec-19 1
111111 Jan-20 0
222222 Apr-19 0
222222 May-19 0
222222 Jun-19 0
222222 Jul-19 0
222222 Aug-19 1
222222 Sep-19 1
222222 Oct-19 1
222222 Nov-19 1
222222 Dec-19 0
222222 Jan-20 0
333333 Apr-19 0
333333 May-19 0
333333 Jun-19 0
333333 Jul-19 0
333333 Aug-19 0
333333 Sep-19 1
333333 Oct-19 1
333333 Nov-19 1
333333 Dec-19 1
333333 Jan-20 1
444444 Apr-19 0
444444 May-19 0
444444 Jun-19 0
444444 Jul-19 0
444444 Aug-19 0
444444 Sep-19 0
444444 Oct-19 0
444444 Nov-19 0
444444 Dec-19 0
444444 Jan-20 0
555555 Apr-19 1
555555 May-19 1
555555 Jun-19 1
555555 Jul-19 1
555555 Aug-19 1
555555 Sep-19 1
555555 Oct-19 1
555555 Nov-19 1
555555 Dec-19 1
555555 Jan-20 1
;;;;
run;
proc sort data = have;
by customer_id reporting_date;
run;
data want(rename=(rd=reporting_date));
set have;
by customer_id rating_18_20 notsorted;
if first.customer_id then
do;
rd = .;
cnt = 0;
end;
if first.rating_18_20 = 1 = rating_18_20 then
do;
rd=reporting_date;
cnt = 0;
end;
cnt+rating_18_20;
if last.customer_id then output;
retain rd;
drop reporting_date rating_18_20;
format rd monyy6.;
run;
proc print;
run;
All the best
Bart
@yabwon Thaaank you so much! exactly what was needed!
Hi @Riri_V ,
glad I could help 🙂
All the best
Bart
An alternative..
data have;
input customer_id $ reporting_date $ rating_18_20;
datalines;
111111 Apr-19 1
111111 May-19 1
111111 Jun-19 0
111111 Jul-19 0
111111 Aug-19 0
111111 Sep-19 0
111111 Oct-19 1
111111 Nov-19 1
111111 Dec-19 1
111111 Jan-20 0
222222 Apr-19 0
222222 May-19 0
222222 Jun-19 0
222222 Jul-19 0
222222 Aug-19 1
222222 Sep-19 1
222222 Oct-19 1
222222 Nov-19 1
222222 Dec-19 0
222222 Jan-20 0
333333 Apr-19 0
333333 May-19 0
333333 Jun-19 0
333333 Jul-19 0
333333 Aug-19 0
333333 Sep-19 1
333333 Oct-19 1
333333 Nov-19 1
333333 Dec-19 1
333333 Jan-20 1
444444 Apr-19 0
444444 May-19 0
444444 Jun-19 0
444444 Jul-19 0
444444 Aug-19 0
444444 Sep-19 0
444444 Oct-19 0
444444 Nov-19 0
444444 Dec-19 0
444444 Jan-20 0
555555 Apr-19 1
555555 May-19 1
555555 Jun-19 1
555555 Jul-19 1
555555 Aug-19 1
555555 Sep-19 1
555555 Oct-19 1
555555 Nov-19 1
555555 Dec-19 1
555555 Jan-20 1
;
data want (keep=customer_id r period_rating_18_20);
if _N_=1 then do;
declare hash h(ordered : 'D', multidata : 'Y');
h.definekey('customer_id');
h.definedata('customer_id', 'period_rating_18_20', 'r');
h.definedone();
declare hiter hi ('h');
end;
do period_rating_18_20 = 1 by 1 until (last.customer_id | last.rating_18_20);
set have end=lr;
by customer_id rating_18_20 notsorted;
if first.customer_id then h.add(key: customer_id, data: customer_id, data: 0, data: " ");
if first.rating_18_20 then r = reporting_date;
end;
if rating_18_20=1 then h.add();
if lr;
do while (hi.last()=0);
output;
rc=hi.next();
h.remove();
end;
run;
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.