BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello

I have a data set with multiple rows per customer ID.

Each row represent existence of customer  in specific month (YYYYMM).

My target- For each customer I want to identify if there is lack of continuity in data.

For example:

CustomerID=1 have full data from JAN2023 till JUN2023  (so calculated indicator field will get value=1)

CustomerID=2 have full data from APR2023 till JUN2023(so calculated indicator field will get value=1)

CustomerID=3 have full data from JAN2023 till APR2023(so calculated indicator field will get value=1)

CustomerID=4 have data hole in Apr2023 (so calculated indicator field will get value=0)

CustomerID=5 have full data from MAR2023 till JUN2023(so calculated indicator field will get value=1)

CustomerID=6 have data hole in MAR2023,APR2023,MAY2023 (so calculated indicator field will get value=0)

What is the way to calculate it?

Remember that desired wanted data set will have one row per customer

 

Data have;
input custID YYYYMM;
cards;
1 202301
1 202302
1 202303
1 202304
1 202305
1 202306
2 202304
2 202305
2 202306
3 202301
3 202302
3 202303
3 202304
4 202301
4 202302
4 202303
4 202305
4 202306 
5 202303
5 202304
5 202305
5 202306
6 202301
6 202302
6 202306
;
Run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

BEFORE you do anything, convert those useless numbers to SAS date values. After that, it's a simple application of INTCK, e.g.

data want;
indicator = 1;
do until (last.custid);
  set have;
  by custid;
  if intck('month',lag(yyyymm),yyyymm) > 1 and not first.custid then indicator = 0;
end;
keep custid indicator;
run;

Untested, posted from my tablet.

View solution in original post

4 REPLIES 4
ballardw
Super User

None of those observations have any date. Start by creating an actual date value.

Use INTCK on the LAGGED date variables with a month to see if the value is one month to set a retained flag for good/bad (1/0).

Us BY group processing and apparently First and LAST to output.

Ronein
Meteorite | Level 14

Here is one solution

Data have;
input custID YYYYMM;
cards;
1 202301
1 202302
1 202303
1 202304
1 202305
1 202306
2 202304
2 202305
2 202306
3 202301
3 202302
3 202303
3 202304
4 202301
4 202302
4 202303
4 202305
4 202306 
5 202303
5 202304
5 202305
5 202306
6 202301
6 202302
6 202306
;
Run;


proc sql;
create table t_min_max as
select custID,
        min(YYYYMM) as min_month,
		max(YYYYMM) as max_month,
		count(*) as nr_months_With_Data
from have
group by custID
;
quit;

proc sql;
create table ttt as
select a.*,b.min_month,b.max_month,b.nr_months_With_Data,intck("month",input(cat(b.min_month),yymmn6.),input(cat(b.max_month),yymmn6.))+1 as nr_Months_between_min_max,
case when b.nr_months_With_Data=calculated nr_Months_between_min_max  then 1 else 0 end as Ind
from have as a
left join t_min_max  as b
on a.custID=b.custID
;
quit;
 
proc sql;
create  table want as
select custID,  
			max(case when nr_months_With_Data= nr_Months_between_min_max  then 1 else 0 end) as Ind
from ttt
group by custID
;
quit;
Kurt_Bremser
Super User

BEFORE you do anything, convert those useless numbers to SAS date values. After that, it's a simple application of INTCK, e.g.

data want;
indicator = 1;
do until (last.custid);
  set have;
  by custid;
  if intck('month',lag(yyyymm),yyyymm) > 1 and not first.custid then indicator = 0;
end;
keep custid indicator;
run;

Untested, posted from my tablet.

Ronein
Meteorite | Level 14

Here is summary of the solutions:

Data have;
input custID YYYYMM;
cards;
1 202301
1 202302
1 202303
1 202304
1 202305
1 202306
2 202304
2 202305
2 202306
3 202301
3 202302
3 202303
3 202304
4 202301
4 202302
4 202303
4 202305
4 202306 
5 202303
5 202304
5 202305
5 202306
6 202301
6 202302
6 202306
;
Run;


/****Way1*******************/
/****Way1*******************/
/****Way1*******************/
/****Way1*******************/
Data have_b(Drop=YYYYMM rename=(_yyyymm_ =YYYYMM));
set have;
_yyyymm_=input(cat(YYYYMM),yymmn6.);
format _yyyymm_ date9.;
Run;

data want;
indicator = 1;
do until (last.custid);
set have_b;
by custid;
if intck('month',lag(yyyymm),yyyymm) > 1 and not first.custid then indicator = 0;
end;
keep custid indicator;
run;

/****Way2*******************/
/****Way2*******************/
/****Way2*******************/
proc sql;
create table t_min_max as
select custID,
        min(YYYYMM) as min_month,
		max(YYYYMM) as max_month,
		count(*) as nr_months_With_Data
from have
group by custID
;
quit;

proc sql;
create table ttt as
select a.*,b.min_month,b.max_month,b.nr_months_With_Data,intck("month",input(cat(b.min_month),yymmn6.),input(cat(b.max_month),yymmn6.))+1 as nr_Months_between_min_max,
case when b.nr_months_With_Data=calculated nr_Months_between_min_max  then 1 else 0 end as Ind
from have as a
left join t_min_max  as b
on a.custID=b.custID
;
quit;
 
proc sql;
create  table want as
select custID,  
			max(case when nr_months_With_Data= nr_Months_between_min_max  then 1 else 0 end) as Ind
from ttt
group by custID
;
quit;

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 487 views
  • 2 likes
  • 3 in conversation