A customer can be a member during certain intervals. I'd like to cut down on unnecesary rows and select the correct valid-from and valid-to dates.
This is what I've got (including 3 example customers):
CUSTOMER_NR | VALID_FROM_DT | VALID_TO_DT | MEMBER_FLG |
111 | 09AUG2013 | 20MAY2014 | 1 |
111 | 21MAY2014 | 22DEC2015 | 1 |
111 | 23DEC2015 | 21MAR2016 | 1 |
111 | 22MAR2016 | 27AUG2017 | 0 |
111 | 28AUG2017 | 03JAN2018 | 1 |
111 | 04JAN2018 | 08MAR2018 | 0 |
111 | 09MAR2019 | 05JUL2019 | 1 |
111 | 06JUL2019 | 31DEC9999 | 1 |
222 | 08APR2018 | 28NOV2018 | 0 |
222 | 29NOV2018 | 04MAY2019 | 1 |
222 | 05MAY2019 | 31DEC9999 | 1 |
333 | 03JAN2019 | 23OCT2019 | 1 |
333 | 24OCT2019 | 31DEC9999 | 0 |
The first "membership period" for customer 111 is from 09AUG2013-21MAR2016. So I'd like to compress that to 1 row, selecting those dates. But the customer also has 2 other "membership periods", so he should have 3 output rows in total.
This is what I want:
CUSTOMER_NR | VALID_FROM_DT | VALID_TO_DT | MEMBER_FLG |
111 | 09AUG2013 | 21MAR2016 | 1 |
111 | 28AUG2017 | 03JAN2018 | 1 |
111 | 09MAR2019 | 31DEC9999 | 1 |
222 | 29NOV2018 | 31DEC9999 | 1 |
333 | 03JAN2019 | 23OCT2019 | 1 |
I'd appreciate any code suggestions. I'm having some trouble with the "membership pauses" for customer 111.
Hi @EinarRoed
You can try the code below
data have;
infile datalines dlm="09"x;
input CUSTOMER_NR VALID_FROM_DT VALID_TO_DT MEMBER_FLG;
informat VALID_FROM_DT VALID_TO_DT date9.;
format VALID_FROM_DT VALID_TO_DT date9.;
cards;
111 09AUG2013 20MAY2014 1
111 21MAY2014 22DEC2015 1
111 23DEC2015 21MAR2016 1
111 22MAR2016 27AUG2017 0
111 28AUG2017 03JAN2018 1
111 04JAN2018 08MAR2018 0
111 09MAR2019 05JUL2019 1
111 06JUL2019 31DEC9999 1
222 08APR2018 28NOV2018 0
222 29NOV2018 04MAY2019 1
222 05MAY2019 31DEC9999 1
333 03JAN2019 23OCT2019 1
333 24OCT2019 31DEC9999 0
;
run;
data have2;
set have;
where MEMBER_FLG=1;
by CUSTOMER_NR;
if first.CUSTOMER_NR then gap=0;
format _lag date9.;
_lag = lag(VALID_TO_DT);
if VALID_FROM_DT - _lag ne 1 and first.CUSTOMER_NR=0 then gap + 1;
run;
proc sql;
create table want as
select CUSTOMER_NR,
min(VALID_FROM_DT) as VALID_FROM_DT format=date9.,
max(VALID_TO_DT) as VALID_TO_DT format=date9.,
1 as MEMBER_FLG
from have2
group by CUSTOMER_NR, gap;
quit;
Result:
Hi @EinarRoed
You can try the code below
data have;
infile datalines dlm="09"x;
input CUSTOMER_NR VALID_FROM_DT VALID_TO_DT MEMBER_FLG;
informat VALID_FROM_DT VALID_TO_DT date9.;
format VALID_FROM_DT VALID_TO_DT date9.;
cards;
111 09AUG2013 20MAY2014 1
111 21MAY2014 22DEC2015 1
111 23DEC2015 21MAR2016 1
111 22MAR2016 27AUG2017 0
111 28AUG2017 03JAN2018 1
111 04JAN2018 08MAR2018 0
111 09MAR2019 05JUL2019 1
111 06JUL2019 31DEC9999 1
222 08APR2018 28NOV2018 0
222 29NOV2018 04MAY2019 1
222 05MAY2019 31DEC9999 1
333 03JAN2019 23OCT2019 1
333 24OCT2019 31DEC9999 0
;
run;
data have2;
set have;
where MEMBER_FLG=1;
by CUSTOMER_NR;
if first.CUSTOMER_NR then gap=0;
format _lag date9.;
_lag = lag(VALID_TO_DT);
if VALID_FROM_DT - _lag ne 1 and first.CUSTOMER_NR=0 then gap + 1;
run;
proc sql;
create table want as
select CUSTOMER_NR,
min(VALID_FROM_DT) as VALID_FROM_DT format=date9.,
max(VALID_TO_DT) as VALID_TO_DT format=date9.,
1 as MEMBER_FLG
from have2
group by CUSTOMER_NR, gap;
quit;
Result:
Worked like a charm! Thanks Ed.
Sounds great!
Thank you @EinarRoed
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.