BookmarkSubscribeRSS Feed
Meteorite | Level 14

## Find for each customer missing months

Hello

For each customer ID there are multiple rows (Each row represent information in different time).

YYMM  column is year+month in structure YYMM (numeric value).

I want to find for each customer which months (YYMM) are missing between the first month and last month.

For example:

For customer 111

first month is 2201

Last month is 2204

And all months between 2201 and 2204 appear so the new column will get value "No"

Customer 222

First month 2203

last month 2210

Month 2205 is missing so new column should get value 2205

Customer 444
First month 2205
last month 2212
Months 2206 and 2211  are  missing so new column should get value 2206-2211

What is the way to create the wanted data set?

In the wanted data set each customer will have one row

``````data have;
input CustID YYMM;
cards;
111 2201
111 2202
111 2203
111 2204
222 2203
222 2204
222 2206
222 2207
222 2208
222 2209
222 2210
333 2206
333 2207
444 2205
444 2207
444 2208
444 2209
444 2210
444 2212
;
Run;

data want;
infile datalines dlm="," dsd;
input CustID missing_Months_Vector \$;
cards;
111,No
222,2205
333,No
444,2206-2211
;
Run;``````

7 REPLIES 7
Super User

## Re: Find for each customer missing months

Don't create any form of "wide" dataset; Long Beats Wide (Maxim 19).

And NEVER (NEVER!!!) store date values in stupid numbers like these; use SAS date values!

With SAS date values, this should do it:

``````data want;
merge
have
have (
firstobs=2
rename=(custid=_cust yymm=_yymm)
)
;
if _cust = custid
then do;
yymm = intnx('month`,yymm,1,'b');
do while (yymm lt _yymm);
output;
yymm = intnx('month',yymm,1,'b');
end;
end;
keep custid yymm;
run;
``````
Meteorite | Level 14

## Re: Find for each customer missing months

I  added another field with sas date instead of numeric YYMM.

I run your code but I cannot see any result (get empty data set).

The reason that wanted data set is wide is that I want to print the wanted data set and see for each customer one row only

``````data have;
input CustID YYMM   yymm2 : date9.;
format yymm2    ddmmyy10.;
cards;
111 2201 '01JAN2022'd
111 2202 '01FEB2022'd
111 2203 '01MAR2022'd
111 2204 '01APR2022'd
222 2203 '01MAR2022'd
222 2204 '01APR2022'd
222 2206 '01JUN2022'd
222 2207 '01JUL2022'd
222 2208 '01AUG2022'd
222 2209 '01SEP2022'd
222 2210 '01OCT2022'd
;
Run;

data want;
merge have have (firstobs=2rename=(custid=_cust yymm=_yymm));
if _cust = custid then do;
yymm = intnx('month',yymm,1,'b');
do while (yymm<=_yymm);
output;
yymm = intnx('month',yymm,1,'b');
end;
end;
keep custid yymm;
run;

``````
Super User

## Re: Find for each customer missing months

When you create an extra date variable, you should then also use it in the code in place of the original yymm.

But you do not need two separate variables. There are SAS formats to display dates as year/month only.

And, as you said, you want to later print the data in wide format, do this with a reporting procedure.

Meteorite | Level 14

## Re: Find for each customer missing months

Sorry for that but may you please show the code from start till end because I run the code and it is not working for me

Super User

## Re: Find for each customer missing months

See this:

``````data have;
input CustID YYMM :yymmn6.;
format yymm yymmn6.;
cards;
111 2201
111 2202
111 2203
111 2204
222 2203
222 2204
222 2206
222 2207
222 2208
222 2209
222 2210
333 2206
333 2207
444 2205
444 2207
444 2208
444 2209
444 2210
444 2212
;

data want;
merge
have
have (
firstobs=2
rename=(custid=_cust yymm=_yymm)
)
;
retain n 1;
if _cust = custid
then do;
yymm = intnx('month',yymm,1,'b');
do while (yymm lt _yymm);
output;
yymm = intnx('month',yymm,1,'b');
n + 1;
end;
end;
else n = 1;
keep custid yymm n;
run;

proc report data=want;
column custid yymm,n;
define custid / group;
define yymm / "" analysis;
define n / "" across ;
run;

data _null_;
set want;
by custid;
if first.custid
then put custid yymm @@;
else put +(-1)"," yymm @@;
if last.custid then put;
run;``````

One step creates a report, the other similar output which you can write to a file or an ODS destination.

Diamond | Level 26

## Re: Find for each customer missing months

@Ronein wrote:

Hello

For each customer ID there are multiple rows (Each row represent information in different time).

YYMM  column is year+month in structure YYMM (numeric value).

I want to find for each customer which months (YYMM) are missing between the first month and last month.

For example:

For customer 111

first month is 2201

Last month is 2204

And all months between 2201 and 2204 appear so the new column will get value "No"

Customer 222

First month 2203

last month 2210

Month 2205 is missing so new column should get value 2205

Customer 444
First month 2205
last month 2212
Months 2206 and 2211  are  missing so new column should get value 2206-2211

What is the way to create the wanted data set?

In the wanted data set each customer will have one row

It seems like you want a new column to say "No" when there are no missing months; it should say "2205" (a character string) for 222 and for 444 you want the character string "2206-2211". What is the benefit of creating missing months as character strings? What would you do with these character strings once you have them? Please explain how these character strings will be used in future analysis or reporting.

I say all of this because it seems to me that there are better ways of getting from here to there (better in the sense of easier programming) that we can help you with, if only we knew where you are trying to go. This seems like a textbook example of the XY Problem.

--
Paige Miller
Super User

## Re: Find for each customer missing months

``````data have;
input CustID YYMM;
cards;
111 2201
111 2202
111 2203
111 2204
222 2203
222 2204
222 2206
222 2207
222 2208
222 2209
222 2210
333 2206
333 2207
444 2205
444 2207
444 2208
444 2209
444 2210
444 2212
;
Run;
data temp;
set have;
date=mdy(mod(YYMM,100),1,int(YYMM/100));
format date yymmn4.;
drop YYMM;
run;
proc summary data=temp ;
by CustID;
var date;
output out=temp2(drop=_:) min=min max=max;
run;
data temp3;
set temp2;
do date=min to max;
if month ne month(date) then output;
month=month(date);
end;
format date yymmn4.;
drop min max month;
run;
proc sql;
create table want as
select distinct CustID,put(date,yymmn4. -l) as yymm  from (select * from temp3 except select * from temp)
union
select distinct CustID,'No' from temp where CustID not in (select CustID from (select * from temp3 except select * from temp))
;
quit;
data final_want;
do until(last.CustID);
set want;
by CustID;
length missing_Months_Vector \$ 200;
missing_Months_Vector=catx(',',missing_Months_Vector,yymm);
end;
drop yymm;
run;``````
Discussion stats
• 7 replies
• 246 views
• 0 likes
• 4 in conversation