BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello

I have a data set with multiple rows per customer ID.

Each row represent existence of customer  in specific month (YYYYMM).

My target- For each customer I want to identify if there is lack of continuity in data.

For example:

CustomerID=1 have full data from JAN2023 till JUN2023  (so calculated indicator field will get value=1)

CustomerID=2 have full data from APR2023 till JUN2023(so calculated indicator field will get value=1)

CustomerID=3 have full data from JAN2023 till APR2023(so calculated indicator field will get value=1)

CustomerID=4 have data hole in Apr2023 (so calculated indicator field will get value=0)

CustomerID=5 have full data from MAR2023 till JUN2023(so calculated indicator field will get value=1)

CustomerID=6 have data hole in MAR2023,APR2023,MAY2023 (so calculated indicator field will get value=0)

What is the way to calculate it?

Remember that desired wanted data set will have one row per customer

 

Data have;
input custID YYYYMM;
cards;
1 202301
1 202302
1 202303
1 202304
1 202305
1 202306
2 202304
2 202305
2 202306
3 202301
3 202302
3 202303
3 202304
4 202301
4 202302
4 202303
4 202305
4 202306 
5 202303
5 202304
5 202305
5 202306
6 202301
6 202302
6 202306
;
Run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

BEFORE you do anything, convert those useless numbers to SAS date values. After that, it's a simple application of INTCK, e.g.

data want;
indicator = 1;
do until (last.custid);
  set have;
  by custid;
  if intck('month',lag(yyyymm),yyyymm) > 1 and not first.custid then indicator = 0;
end;
keep custid indicator;
run;

Untested, posted from my tablet.

View solution in original post

4 REPLIES 4
ballardw
Super User

None of those observations have any date. Start by creating an actual date value.

Use INTCK on the LAGGED date variables with a month to see if the value is one month to set a retained flag for good/bad (1/0).

Us BY group processing and apparently First and LAST to output.

Ronein
Meteorite | Level 14

Here is one solution

Data have;
input custID YYYYMM;
cards;
1 202301
1 202302
1 202303
1 202304
1 202305
1 202306
2 202304
2 202305
2 202306
3 202301
3 202302
3 202303
3 202304
4 202301
4 202302
4 202303
4 202305
4 202306 
5 202303
5 202304
5 202305
5 202306
6 202301
6 202302
6 202306
;
Run;


proc sql;
create table t_min_max as
select custID,
        min(YYYYMM) as min_month,
		max(YYYYMM) as max_month,
		count(*) as nr_months_With_Data
from have
group by custID
;
quit;

proc sql;
create table ttt as
select a.*,b.min_month,b.max_month,b.nr_months_With_Data,intck("month",input(cat(b.min_month),yymmn6.),input(cat(b.max_month),yymmn6.))+1 as nr_Months_between_min_max,
case when b.nr_months_With_Data=calculated nr_Months_between_min_max  then 1 else 0 end as Ind
from have as a
left join t_min_max  as b
on a.custID=b.custID
;
quit;
 
proc sql;
create  table want as
select custID,  
			max(case when nr_months_With_Data= nr_Months_between_min_max  then 1 else 0 end) as Ind
from ttt
group by custID
;
quit;
Kurt_Bremser
Super User

BEFORE you do anything, convert those useless numbers to SAS date values. After that, it's a simple application of INTCK, e.g.

data want;
indicator = 1;
do until (last.custid);
  set have;
  by custid;
  if intck('month',lag(yyyymm),yyyymm) > 1 and not first.custid then indicator = 0;
end;
keep custid indicator;
run;

Untested, posted from my tablet.

Ronein
Meteorite | Level 14

Here is summary of the solutions:

Data have;
input custID YYYYMM;
cards;
1 202301
1 202302
1 202303
1 202304
1 202305
1 202306
2 202304
2 202305
2 202306
3 202301
3 202302
3 202303
3 202304
4 202301
4 202302
4 202303
4 202305
4 202306 
5 202303
5 202304
5 202305
5 202306
6 202301
6 202302
6 202306
;
Run;


/****Way1*******************/
/****Way1*******************/
/****Way1*******************/
/****Way1*******************/
Data have_b(Drop=YYYYMM rename=(_yyyymm_ =YYYYMM));
set have;
_yyyymm_=input(cat(YYYYMM),yymmn6.);
format _yyyymm_ date9.;
Run;

data want;
indicator = 1;
do until (last.custid);
set have_b;
by custid;
if intck('month',lag(yyyymm),yyyymm) > 1 and not first.custid then indicator = 0;
end;
keep custid indicator;
run;

/****Way2*******************/
/****Way2*******************/
/****Way2*******************/
proc sql;
create table t_min_max as
select custID,
        min(YYYYMM) as min_month,
		max(YYYYMM) as max_month,
		count(*) as nr_months_With_Data
from have
group by custID
;
quit;

proc sql;
create table ttt as
select a.*,b.min_month,b.max_month,b.nr_months_With_Data,intck("month",input(cat(b.min_month),yymmn6.),input(cat(b.max_month),yymmn6.))+1 as nr_Months_between_min_max,
case when b.nr_months_With_Data=calculated nr_Months_between_min_max  then 1 else 0 end as Ind
from have as a
left join t_min_max  as b
on a.custID=b.custID
;
quit;
 
proc sql;
create  table want as
select custID,  
			max(case when nr_months_With_Data= nr_Months_between_min_max  then 1 else 0 end) as Ind
from ttt
group by custID
;
quit;

 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 693 views
  • 2 likes
  • 3 in conversation