Hello everyone,
I've done something similar in the past, but now can't really remember how I did that.
Hope someone can help me with this.
Here's a sample of historical data and final result i'm aiming for
Sample
customer_id | reporting_date | rating_18_20 (flag) |
111111 | Apr-19 | 1 |
111111 | May-19 | 1 |
111111 | Jun-19 | 0 |
111111 | Jul-19 | 0 |
111111 | Aug-19 | 0 |
111111 | Sep-19 | 0 |
111111 | Oct-19 | 1 |
111111 | Nov-19 | 1 |
111111 | Dec-19 | 1 |
111111 | Jan-20 | 0 |
222222 | Apr-19 | 0 |
222222 | May-19 | 0 |
222222 | Jun-19 | 0 |
222222 | Jul-19 | 0 |
222222 | Aug-19 | 1 |
222222 | Sep-19 | 1 |
222222 | Oct-19 | 1 |
222222 | Nov-19 | 1 |
222222 | Dec-19 | 0 |
222222 | Jan-20 | 0 |
333333 | Apr-19 | 0 |
333333 | May-19 | 0 |
333333 | Jun-19 | 0 |
333333 | Jul-19 | 0 |
333333 | Aug-19 | 0 |
333333 | Sep-19 | 1 |
333333 | Oct-19 | 1 |
333333 | Nov-19 | 1 |
333333 | Dec-19 | 1 |
333333 | Jan-20 | 1 |
444444 | Apr-19 | 0 |
444444 | May-19 | 0 |
444444 | Jun-19 | 0 |
444444 | Jul-19 | 0 |
444444 | Aug-19 | 0 |
444444 | Sep-19 | 0 |
444444 | Oct-19 | 0 |
444444 | Nov-19 | 0 |
444444 | Dec-19 | 0 |
444444 | Jan-20 | 0 |
555555 | Apr-19 | 1 |
555555 | May-19 | 1 |
555555 | Jun-19 | 1 |
555555 | Jul-19 | 1 |
555555 | Aug-19 | 1 |
555555 | Sep-19 | 1 |
555555 | Oct-19 | 1 |
555555 | Nov-19 | 1 |
555555 | Dec-19 | 1 |
555555 | Jan-20 | 1 |
Needed table:
Data aggregated on customer level where reporting_date_rating is a date when customer got rating_18_20 =1 for the last time in history and period_rating_18_20 is amount of months the customer has been possessing the flag for the last period in history.
customer_id | reporting_date_rating | period_rating_18_20 |
111111 | Oct-19 | 3 |
222222 | Aug-19 | 4 |
333333 | Sep-19 | 5 |
444444 | 0 | |
555555 | Apr-19 | 10 |
Here's a code that doesn't work for customers who didn't have rating_18_20 =1 (for example customer number 444444) and returns period_rating_18_20 = 10 instead of 0.
Could anyone please look into it and tell me where the logic is incorrect.
Thanks.
/*18-20*/
proc sort data=WORK.SAMPLE (keep=customer_id reporting_date rating_18_20)
out=rating_flags_sort nodup;
by customer_id rating_18_20 ;
run;
data rating;
set rating_flags_sort;
format for_lag_dt lag_dt date9.;
by customer_id rating_18_20;
retain num_of_period for_lag_dt cnt_months;
if first.rating_18_20 then do;
num_of_period = 1;
for_lag_dt = reporting_date;
cnt_months= 1;
end;
else do;
lag_dt = for_lag_dt;
for_lag_dt = reporting_date;
a=intck('month', lag_dt, reporting_date);
if a>1 then do; num_of_period+1;
cnt_months =1;end;
else cnt_months + 1;
end;
/*drop lag_dt for_lag_dt;*/
run;
/*doesn't work for customers with 0 flags - returns max amount of months in history*/
data rating_2;
set rating;
by customer_id;
if last.customer_id;
keep customer_id rating_18_20 cnt_months;
run;
/*date of rating*/
proc sort data=rating out=rating_sort_2;
by customer_id descending rating_18_20 descending num_of_period reporting_date;
run;
data rating_3;
set rating_sort_2;
by customer_id descending rating_18_20 descending num_of_period reporting_date;
if first.customer_id and first.num_of_period;
keep customer_id reporting_date;
run;
data rating_total;
merge rating_2 rating_3;
by customer_id;
run;
Hi,
How about group by processing?
data have;
input customer_id : $ reporting_date monyy6. rating_18_20;
format reporting_date yymmdd10.;
cards4;
111111 Apr-19 1
111111 May-19 1
111111 Jun-19 0
111111 Jul-19 0
111111 Aug-19 0
111111 Sep-19 0
111111 Oct-19 1
111111 Nov-19 1
111111 Dec-19 1
111111 Jan-20 0
222222 Apr-19 0
222222 May-19 0
222222 Jun-19 0
222222 Jul-19 0
222222 Aug-19 1
222222 Sep-19 1
222222 Oct-19 1
222222 Nov-19 1
222222 Dec-19 0
222222 Jan-20 0
333333 Apr-19 0
333333 May-19 0
333333 Jun-19 0
333333 Jul-19 0
333333 Aug-19 0
333333 Sep-19 1
333333 Oct-19 1
333333 Nov-19 1
333333 Dec-19 1
333333 Jan-20 1
444444 Apr-19 0
444444 May-19 0
444444 Jun-19 0
444444 Jul-19 0
444444 Aug-19 0
444444 Sep-19 0
444444 Oct-19 0
444444 Nov-19 0
444444 Dec-19 0
444444 Jan-20 0
555555 Apr-19 1
555555 May-19 1
555555 Jun-19 1
555555 Jul-19 1
555555 Aug-19 1
555555 Sep-19 1
555555 Oct-19 1
555555 Nov-19 1
555555 Dec-19 1
555555 Jan-20 1
;;;;
run;
proc sort data = have;
by customer_id reporting_date;
run;
data want(rename=(rd=reporting_date));
set have;
by customer_id rating_18_20 notsorted;
if first.customer_id then
do;
rd = .;
cnt = 0;
end;
if first.rating_18_20 = 1 = rating_18_20 then
do;
rd=reporting_date;
cnt = 0;
end;
cnt+rating_18_20;
if last.customer_id then output;
retain rd;
drop reporting_date rating_18_20;
format rd monyy6.;
run;
proc print;
run;
All the best
Bart
Hi @Riri_V Assuming I understand your requirement, the following should work
data have;
input customer_id reporting_date $ rating_18_20;
cards;
111111 Apr-19 1
111111 May-19 1
111111 Jun-19 0
111111 Jul-19 0
111111 Aug-19 0
111111 Sep-19 0
111111 Oct-19 1
111111 Nov-19 1
111111 Dec-19 1
111111 Jan-20 0
222222 Apr-19 0
222222 May-19 0
222222 Jun-19 0
222222 Jul-19 0
222222 Aug-19 1
222222 Sep-19 1
222222 Oct-19 1
222222 Nov-19 1
222222 Dec-19 0
222222 Jan-20 0
333333 Apr-19 0
333333 May-19 0
333333 Jun-19 0
333333 Jul-19 0
333333 Aug-19 0
333333 Sep-19 1
333333 Oct-19 1
333333 Nov-19 1
333333 Dec-19 1
333333 Jan-20 1
444444 Apr-19 0
444444 May-19 0
444444 Jun-19 0
444444 Jul-19 0
444444 Aug-19 0
444444 Sep-19 0
444444 Oct-19 0
444444 Nov-19 0
444444 Dec-19 0
444444 Jan-20 0
555555 Apr-19 1
555555 May-19 1
555555 Jun-19 1
555555 Jul-19 1
555555 Aug-19 1
555555 Sep-19 1
555555 Oct-19 1
555555 Nov-19 1
555555 Dec-19 1
555555 Jan-20 1
;
data want;
do until(last.customer_id);
do _n_=1 by 1 until(last.rating_18_20);
set have ;
by customer_id rating_18_20 notsorted;
if rating_18_20 then do;
if _n_=1 then reporting_date_rating=reporting_date;
period_rating_18_20=_n_;
end;
end;
end;
period_rating_18_20=max(period_rating_18_20,0);
keep customer_id reporting_date_rating period_rating_18_20;
run;
customer_id | reporting_date_rating | period_rating_18_20 |
---|---|---|
111111 | Oct-19 | 3 |
222222 | Aug-19 | 4 |
333333 | Sep-19 | 5 |
444444 | 0 | |
555555 | Apr-19 | 10 |
Thanks, works perfectly on the sample! But on the whole table discounted flags from past years.
Hi @Riri_V
here is an attempt to achieve this:
data have;
infile datalines dlm="09"x;
input customer_id reporting_date :monyy6. rating_18_20;
format reporting_date date9.;
datalines;
111111 Apr-19 1
111111 May-19 1
111111 Jun-19 0
111111 Jul-19 0
111111 Aug-19 0
111111 Sep-19 0
111111 Oct-19 1
111111 Nov-19 1
111111 Dec-19 1
111111 Jan-20 0
222222 Apr-19 0
222222 May-19 0
222222 Jun-19 0
222222 Jul-19 0
222222 Aug-19 1
222222 Sep-19 1
222222 Oct-19 1
222222 Nov-19 1
222222 Dec-19 0
222222 Jan-20 0
333333 Apr-19 0
333333 May-19 0
333333 Jun-19 0
333333 Jul-19 0
333333 Aug-19 0
333333 Sep-19 1
333333 Oct-19 1
333333 Nov-19 1
333333 Dec-19 1
333333 Jan-20 1
444444 Apr-19 0
444444 May-19 0
444444 Jun-19 0
444444 Jul-19 0
444444 Aug-19 0
444444 Sep-19 0
444444 Oct-19 0
444444 Nov-19 0
444444 Dec-19 0
444444 Jan-20 0
555555 Apr-19 1
555555 May-19 1
555555 Jun-19 1
555555 Jul-19 1
555555 Aug-19 1
555555 Sep-19 1
555555 Oct-19 1
555555 Nov-19 1
555555 Dec-19 1
555555 Jan-20 1
;
run;
/* Customers with at least one flag = 1*/
proc sort data=have out=have_sorted_d;
by customer_id descending reporting_date;
run;
data have2;
set have_sorted_d;
by customer_id rating_18_20 notsorted;
if first.rating_18_20 then count + 1;
run;
proc sql;
create table have3 as
select *, count(count) as period_rating_18_20
from have2
where rating_18_20 = 1
group by customer_id, count
order by customer_id, count, reporting_date asc;
quit;
data have4;
set have3;
by customer_id;
if first.customer_id then output;
drop count rating_18_20;
run;
/* Customers with no flag = 1*/
proc sql;
create table have_noflag as
select distinct customer_id
from have
group by customer_id
having sum(rating_18_20) = 0;
quit;
/* Final table */
data want;
merge have4 have_noflag;
by customer_id;
if period_rating_18_20=. then period_rating_18_20=0;
run;
Hi,
How about group by processing?
data have;
input customer_id : $ reporting_date monyy6. rating_18_20;
format reporting_date yymmdd10.;
cards4;
111111 Apr-19 1
111111 May-19 1
111111 Jun-19 0
111111 Jul-19 0
111111 Aug-19 0
111111 Sep-19 0
111111 Oct-19 1
111111 Nov-19 1
111111 Dec-19 1
111111 Jan-20 0
222222 Apr-19 0
222222 May-19 0
222222 Jun-19 0
222222 Jul-19 0
222222 Aug-19 1
222222 Sep-19 1
222222 Oct-19 1
222222 Nov-19 1
222222 Dec-19 0
222222 Jan-20 0
333333 Apr-19 0
333333 May-19 0
333333 Jun-19 0
333333 Jul-19 0
333333 Aug-19 0
333333 Sep-19 1
333333 Oct-19 1
333333 Nov-19 1
333333 Dec-19 1
333333 Jan-20 1
444444 Apr-19 0
444444 May-19 0
444444 Jun-19 0
444444 Jul-19 0
444444 Aug-19 0
444444 Sep-19 0
444444 Oct-19 0
444444 Nov-19 0
444444 Dec-19 0
444444 Jan-20 0
555555 Apr-19 1
555555 May-19 1
555555 Jun-19 1
555555 Jul-19 1
555555 Aug-19 1
555555 Sep-19 1
555555 Oct-19 1
555555 Nov-19 1
555555 Dec-19 1
555555 Jan-20 1
;;;;
run;
proc sort data = have;
by customer_id reporting_date;
run;
data want(rename=(rd=reporting_date));
set have;
by customer_id rating_18_20 notsorted;
if first.customer_id then
do;
rd = .;
cnt = 0;
end;
if first.rating_18_20 = 1 = rating_18_20 then
do;
rd=reporting_date;
cnt = 0;
end;
cnt+rating_18_20;
if last.customer_id then output;
retain rd;
drop reporting_date rating_18_20;
format rd monyy6.;
run;
proc print;
run;
All the best
Bart
@yabwon Thaaank you so much! exactly what was needed!
Hi @Riri_V ,
glad I could help 🙂
All the best
Bart
An alternative..
data have;
input customer_id $ reporting_date $ rating_18_20;
datalines;
111111 Apr-19 1
111111 May-19 1
111111 Jun-19 0
111111 Jul-19 0
111111 Aug-19 0
111111 Sep-19 0
111111 Oct-19 1
111111 Nov-19 1
111111 Dec-19 1
111111 Jan-20 0
222222 Apr-19 0
222222 May-19 0
222222 Jun-19 0
222222 Jul-19 0
222222 Aug-19 1
222222 Sep-19 1
222222 Oct-19 1
222222 Nov-19 1
222222 Dec-19 0
222222 Jan-20 0
333333 Apr-19 0
333333 May-19 0
333333 Jun-19 0
333333 Jul-19 0
333333 Aug-19 0
333333 Sep-19 1
333333 Oct-19 1
333333 Nov-19 1
333333 Dec-19 1
333333 Jan-20 1
444444 Apr-19 0
444444 May-19 0
444444 Jun-19 0
444444 Jul-19 0
444444 Aug-19 0
444444 Sep-19 0
444444 Oct-19 0
444444 Nov-19 0
444444 Dec-19 0
444444 Jan-20 0
555555 Apr-19 1
555555 May-19 1
555555 Jun-19 1
555555 Jul-19 1
555555 Aug-19 1
555555 Sep-19 1
555555 Oct-19 1
555555 Nov-19 1
555555 Dec-19 1
555555 Jan-20 1
;
data want (keep=customer_id r period_rating_18_20);
if _N_=1 then do;
declare hash h(ordered : 'D', multidata : 'Y');
h.definekey('customer_id');
h.definedata('customer_id', 'period_rating_18_20', 'r');
h.definedone();
declare hiter hi ('h');
end;
do period_rating_18_20 = 1 by 1 until (last.customer_id | last.rating_18_20);
set have end=lr;
by customer_id rating_18_20 notsorted;
if first.customer_id then h.add(key: customer_id, data: customer_id, data: 0, data: " ");
if first.rating_18_20 then r = reporting_date;
end;
if rating_18_20=1 then h.add();
if lr;
do while (hi.last()=0);
output;
rc=hi.next();
h.remove();
end;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.