Hello
I have a data set with multiple rows per customer ID.
Each row represent existence of customer in specific month (YYYYMM).
My target- For each customer I want to identify if there is lack of continuity in data.
For example:
CustomerID=1 have full data from JAN2023 till JUN2023 (so calculated indicator field will get value=1)
CustomerID=2 have full data from APR2023 till JUN2023(so calculated indicator field will get value=1)
CustomerID=3 have full data from JAN2023 till APR2023(so calculated indicator field will get value=1)
CustomerID=4 have data hole in Apr2023 (so calculated indicator field will get value=0)
CustomerID=5 have full data from MAR2023 till JUN2023(so calculated indicator field will get value=1)
CustomerID=6 have data hole in MAR2023,APR2023,MAY2023 (so calculated indicator field will get value=0)
What is the way to calculate it?
Remember that desired wanted data set will have one row per customer
Data have;
input custID YYYYMM;
cards;
1 202301
1 202302
1 202303
1 202304
1 202305
1 202306
2 202304
2 202305
2 202306
3 202301
3 202302
3 202303
3 202304
4 202301
4 202302
4 202303
4 202305
4 202306
5 202303
5 202304
5 202305
5 202306
6 202301
6 202302
6 202306
;
Run;
BEFORE you do anything, convert those useless numbers to SAS date values. After that, it's a simple application of INTCK, e.g.
data want;
indicator = 1;
do until (last.custid);
set have;
by custid;
if intck('month',lag(yyyymm),yyyymm) > 1 and not first.custid then indicator = 0;
end;
keep custid indicator;
run;
Untested, posted from my tablet.
None of those observations have any date. Start by creating an actual date value.
Use INTCK on the LAGGED date variables with a month to see if the value is one month to set a retained flag for good/bad (1/0).
Us BY group processing and apparently First and LAST to output.
Here is one solution
Data have;
input custID YYYYMM;
cards;
1 202301
1 202302
1 202303
1 202304
1 202305
1 202306
2 202304
2 202305
2 202306
3 202301
3 202302
3 202303
3 202304
4 202301
4 202302
4 202303
4 202305
4 202306
5 202303
5 202304
5 202305
5 202306
6 202301
6 202302
6 202306
;
Run;
proc sql;
create table t_min_max as
select custID,
min(YYYYMM) as min_month,
max(YYYYMM) as max_month,
count(*) as nr_months_With_Data
from have
group by custID
;
quit;
proc sql;
create table ttt as
select a.*,b.min_month,b.max_month,b.nr_months_With_Data,intck("month",input(cat(b.min_month),yymmn6.),input(cat(b.max_month),yymmn6.))+1 as nr_Months_between_min_max,
case when b.nr_months_With_Data=calculated nr_Months_between_min_max then 1 else 0 end as Ind
from have as a
left join t_min_max as b
on a.custID=b.custID
;
quit;
proc sql;
create table want as
select custID,
max(case when nr_months_With_Data= nr_Months_between_min_max then 1 else 0 end) as Ind
from ttt
group by custID
;
quit;
BEFORE you do anything, convert those useless numbers to SAS date values. After that, it's a simple application of INTCK, e.g.
data want;
indicator = 1;
do until (last.custid);
set have;
by custid;
if intck('month',lag(yyyymm),yyyymm) > 1 and not first.custid then indicator = 0;
end;
keep custid indicator;
run;
Untested, posted from my tablet.
Here is summary of the solutions:
Data have;
input custID YYYYMM;
cards;
1 202301
1 202302
1 202303
1 202304
1 202305
1 202306
2 202304
2 202305
2 202306
3 202301
3 202302
3 202303
3 202304
4 202301
4 202302
4 202303
4 202305
4 202306
5 202303
5 202304
5 202305
5 202306
6 202301
6 202302
6 202306
;
Run;
/****Way1*******************/
/****Way1*******************/
/****Way1*******************/
/****Way1*******************/
Data have_b(Drop=YYYYMM rename=(_yyyymm_ =YYYYMM));
set have;
_yyyymm_=input(cat(YYYYMM),yymmn6.);
format _yyyymm_ date9.;
Run;
data want;
indicator = 1;
do until (last.custid);
set have_b;
by custid;
if intck('month',lag(yyyymm),yyyymm) > 1 and not first.custid then indicator = 0;
end;
keep custid indicator;
run;
/****Way2*******************/
/****Way2*******************/
/****Way2*******************/
proc sql;
create table t_min_max as
select custID,
min(YYYYMM) as min_month,
max(YYYYMM) as max_month,
count(*) as nr_months_With_Data
from have
group by custID
;
quit;
proc sql;
create table ttt as
select a.*,b.min_month,b.max_month,b.nr_months_With_Data,intck("month",input(cat(b.min_month),yymmn6.),input(cat(b.max_month),yymmn6.))+1 as nr_Months_between_min_max,
case when b.nr_months_With_Data=calculated nr_Months_between_min_max then 1 else 0 end as Ind
from have as a
left join t_min_max as b
on a.custID=b.custID
;
quit;
proc sql;
create table want as
select custID,
max(case when nr_months_With_Data= nr_Months_between_min_max then 1 else 0 end) as Ind
from ttt
group by custID
;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.