BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ronein
Onyx | Level 15

Hello

I have a data set with multiple rows per customer ID.

Each row represent existence of customer  in specific month (YYYYMM).

My target- For each customer I want to identify if there is lack of continuity in data.

For example:

CustomerID=1 have full data from JAN2023 till JUN2023  (so calculated indicator field will get value=1)

CustomerID=2 have full data from APR2023 till JUN2023(so calculated indicator field will get value=1)

CustomerID=3 have full data from JAN2023 till APR2023(so calculated indicator field will get value=1)

CustomerID=4 have data hole in Apr2023 (so calculated indicator field will get value=0)

CustomerID=5 have full data from MAR2023 till JUN2023(so calculated indicator field will get value=1)

CustomerID=6 have data hole in MAR2023,APR2023,MAY2023 (so calculated indicator field will get value=0)

What is the way to calculate it?

Remember that desired wanted data set will have one row per customer

 

Data have;
input custID YYYYMM;
cards;
1 202301
1 202302
1 202303
1 202304
1 202305
1 202306
2 202304
2 202305
2 202306
3 202301
3 202302
3 202303
3 202304
4 202301
4 202302
4 202303
4 202305
4 202306 
5 202303
5 202304
5 202305
5 202306
6 202301
6 202302
6 202306
;
Run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

BEFORE you do anything, convert those useless numbers to SAS date values. After that, it's a simple application of INTCK, e.g.

data want;
indicator = 1;
do until (last.custid);
  set have;
  by custid;
  if intck('month',lag(yyyymm),yyyymm) > 1 and not first.custid then indicator = 0;
end;
keep custid indicator;
run;

Untested, posted from my tablet.

View solution in original post

4 REPLIES 4
ballardw
Super User

None of those observations have any date. Start by creating an actual date value.

Use INTCK on the LAGGED date variables with a month to see if the value is one month to set a retained flag for good/bad (1/0).

Us BY group processing and apparently First and LAST to output.

Ronein
Onyx | Level 15

Here is one solution

Data have;
input custID YYYYMM;
cards;
1 202301
1 202302
1 202303
1 202304
1 202305
1 202306
2 202304
2 202305
2 202306
3 202301
3 202302
3 202303
3 202304
4 202301
4 202302
4 202303
4 202305
4 202306 
5 202303
5 202304
5 202305
5 202306
6 202301
6 202302
6 202306
;
Run;


proc sql;
create table t_min_max as
select custID,
        min(YYYYMM) as min_month,
		max(YYYYMM) as max_month,
		count(*) as nr_months_With_Data
from have
group by custID
;
quit;

proc sql;
create table ttt as
select a.*,b.min_month,b.max_month,b.nr_months_With_Data,intck("month",input(cat(b.min_month),yymmn6.),input(cat(b.max_month),yymmn6.))+1 as nr_Months_between_min_max,
case when b.nr_months_With_Data=calculated nr_Months_between_min_max  then 1 else 0 end as Ind
from have as a
left join t_min_max  as b
on a.custID=b.custID
;
quit;
 
proc sql;
create  table want as
select custID,  
			max(case when nr_months_With_Data= nr_Months_between_min_max  then 1 else 0 end) as Ind
from ttt
group by custID
;
quit;
Kurt_Bremser
Super User

BEFORE you do anything, convert those useless numbers to SAS date values. After that, it's a simple application of INTCK, e.g.

data want;
indicator = 1;
do until (last.custid);
  set have;
  by custid;
  if intck('month',lag(yyyymm),yyyymm) > 1 and not first.custid then indicator = 0;
end;
keep custid indicator;
run;

Untested, posted from my tablet.

Ronein
Onyx | Level 15

Here is summary of the solutions:

Data have;
input custID YYYYMM;
cards;
1 202301
1 202302
1 202303
1 202304
1 202305
1 202306
2 202304
2 202305
2 202306
3 202301
3 202302
3 202303
3 202304
4 202301
4 202302
4 202303
4 202305
4 202306 
5 202303
5 202304
5 202305
5 202306
6 202301
6 202302
6 202306
;
Run;


/****Way1*******************/
/****Way1*******************/
/****Way1*******************/
/****Way1*******************/
Data have_b(Drop=YYYYMM rename=(_yyyymm_ =YYYYMM));
set have;
_yyyymm_=input(cat(YYYYMM),yymmn6.);
format _yyyymm_ date9.;
Run;

data want;
indicator = 1;
do until (last.custid);
set have_b;
by custid;
if intck('month',lag(yyyymm),yyyymm) > 1 and not first.custid then indicator = 0;
end;
keep custid indicator;
run;

/****Way2*******************/
/****Way2*******************/
/****Way2*******************/
proc sql;
create table t_min_max as
select custID,
        min(YYYYMM) as min_month,
		max(YYYYMM) as max_month,
		count(*) as nr_months_With_Data
from have
group by custID
;
quit;

proc sql;
create table ttt as
select a.*,b.min_month,b.max_month,b.nr_months_With_Data,intck("month",input(cat(b.min_month),yymmn6.),input(cat(b.max_month),yymmn6.))+1 as nr_Months_between_min_max,
case when b.nr_months_With_Data=calculated nr_Months_between_min_max  then 1 else 0 end as Ind
from have as a
left join t_min_max  as b
on a.custID=b.custID
;
quit;
 
proc sql;
create  table want as
select custID,  
			max(case when nr_months_With_Data= nr_Months_between_min_max  then 1 else 0 end) as Ind
from ttt
group by custID
;
quit;

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1078 views
  • 2 likes
  • 3 in conversation