Hello
I have a data set with multiple rows per customer ID.
Each row represent existence of customer in specific month (YYYYMM).
My target- For each customer I want to identify if there is lack of continuity in data.
For example:
CustomerID=1 have full data from JAN2023 till JUN2023 (so calculated indicator field will get value=1)
CustomerID=2 have full data from APR2023 till JUN2023(so calculated indicator field will get value=1)
CustomerID=3 have full data from JAN2023 till APR2023(so calculated indicator field will get value=1)
CustomerID=4 have data hole in Apr2023 (so calculated indicator field will get value=0)
CustomerID=5 have full data from MAR2023 till JUN2023(so calculated indicator field will get value=1)
CustomerID=6 have data hole in MAR2023,APR2023,MAY2023 (so calculated indicator field will get value=0)
What is the way to calculate it?
Remember that desired wanted data set will have one row per customer
Data have;
input custID YYYYMM;
cards;
1 202301
1 202302
1 202303
1 202304
1 202305
1 202306
2 202304
2 202305
2 202306
3 202301
3 202302
3 202303
3 202304
4 202301
4 202302
4 202303
4 202305
4 202306
5 202303
5 202304
5 202305
5 202306
6 202301
6 202302
6 202306
;
Run;
BEFORE you do anything, convert those useless numbers to SAS date values. After that, it's a simple application of INTCK, e.g.
data want;
indicator = 1;
do until (last.custid);
set have;
by custid;
if intck('month',lag(yyyymm),yyyymm) > 1 and not first.custid then indicator = 0;
end;
keep custid indicator;
run;
Untested, posted from my tablet.
None of those observations have any date. Start by creating an actual date value.
Use INTCK on the LAGGED date variables with a month to see if the value is one month to set a retained flag for good/bad (1/0).
Us BY group processing and apparently First and LAST to output.
Here is one solution
Data have;
input custID YYYYMM;
cards;
1 202301
1 202302
1 202303
1 202304
1 202305
1 202306
2 202304
2 202305
2 202306
3 202301
3 202302
3 202303
3 202304
4 202301
4 202302
4 202303
4 202305
4 202306
5 202303
5 202304
5 202305
5 202306
6 202301
6 202302
6 202306
;
Run;
proc sql;
create table t_min_max as
select custID,
min(YYYYMM) as min_month,
max(YYYYMM) as max_month,
count(*) as nr_months_With_Data
from have
group by custID
;
quit;
proc sql;
create table ttt as
select a.*,b.min_month,b.max_month,b.nr_months_With_Data,intck("month",input(cat(b.min_month),yymmn6.),input(cat(b.max_month),yymmn6.))+1 as nr_Months_between_min_max,
case when b.nr_months_With_Data=calculated nr_Months_between_min_max then 1 else 0 end as Ind
from have as a
left join t_min_max as b
on a.custID=b.custID
;
quit;
proc sql;
create table want as
select custID,
max(case when nr_months_With_Data= nr_Months_between_min_max then 1 else 0 end) as Ind
from ttt
group by custID
;
quit;
BEFORE you do anything, convert those useless numbers to SAS date values. After that, it's a simple application of INTCK, e.g.
data want;
indicator = 1;
do until (last.custid);
set have;
by custid;
if intck('month',lag(yyyymm),yyyymm) > 1 and not first.custid then indicator = 0;
end;
keep custid indicator;
run;
Untested, posted from my tablet.
Here is summary of the solutions:
Data have;
input custID YYYYMM;
cards;
1 202301
1 202302
1 202303
1 202304
1 202305
1 202306
2 202304
2 202305
2 202306
3 202301
3 202302
3 202303
3 202304
4 202301
4 202302
4 202303
4 202305
4 202306
5 202303
5 202304
5 202305
5 202306
6 202301
6 202302
6 202306
;
Run;
/****Way1*******************/
/****Way1*******************/
/****Way1*******************/
/****Way1*******************/
Data have_b(Drop=YYYYMM rename=(_yyyymm_ =YYYYMM));
set have;
_yyyymm_=input(cat(YYYYMM),yymmn6.);
format _yyyymm_ date9.;
Run;
data want;
indicator = 1;
do until (last.custid);
set have_b;
by custid;
if intck('month',lag(yyyymm),yyyymm) > 1 and not first.custid then indicator = 0;
end;
keep custid indicator;
run;
/****Way2*******************/
/****Way2*******************/
/****Way2*******************/
proc sql;
create table t_min_max as
select custID,
min(YYYYMM) as min_month,
max(YYYYMM) as max_month,
count(*) as nr_months_With_Data
from have
group by custID
;
quit;
proc sql;
create table ttt as
select a.*,b.min_month,b.max_month,b.nr_months_With_Data,intck("month",input(cat(b.min_month),yymmn6.),input(cat(b.max_month),yymmn6.))+1 as nr_Months_between_min_max,
case when b.nr_months_With_Data=calculated nr_Months_between_min_max then 1 else 0 end as Ind
from have as a
left join t_min_max as b
on a.custID=b.custID
;
quit;
proc sql;
create table want as
select custID,
max(case when nr_months_With_Data= nr_Months_between_min_max then 1 else 0 end) as Ind
from ttt
group by custID
;
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.