BookmarkSubscribeRSS Feed
Ronein
Meteorite | Level 14

Hello

For each customer ID there are multiple rows (Each row represent information in different time).

YYMM  column is year+month in structure YYMM (numeric value).

I want to find for each customer which months (YYMM) are missing between the first month and last month.

For example:

For customer 111

first month is 2201

Last month is 2204

And all months between 2201 and 2204 appear so the new column will get value "No"

 

Customer 222

First month 2203

last month 2210

Month 2205 is missing so new column should get value 2205

 

Customer 444
First month 2205
last month 2212
Months 2206 and 2211  are  missing so new column should get value 2206-2211

What is the way to create the wanted data set?

In the wanted data set each customer will have one row

 

 

data have;
input CustID YYMM;
cards;
111 2201
111 2202
111 2203
111 2204
222 2203
222 2204
222 2206
222 2207
222 2208
222 2209
222 2210
333 2206
333 2207
444 2205
444 2207
444 2208
444 2209
444 2210
444 2212
;
Run;


data want;
infile datalines dlm="," dsd;
input CustID missing_Months_Vector $;
cards;
111,No 
222,2205
333,No
444,2206-2211
;
Run;

 

 

7 REPLIES 7
Kurt_Bremser
Super User

Don't create any form of "wide" dataset; Long Beats Wide (Maxim 19).

And NEVER (NEVER!!!) store date values in stupid numbers like these; use SAS date values!

With SAS date values, this should do it:

data want;
merge
  have
  have (
    firstobs=2
    rename=(custid=_cust yymm=_yymm)
  )
;
if _cust = custid
then do;
  yymm = intnx('month`,yymm,1,'b');
  do while (yymm lt _yymm);
    output;
    yymm = intnx('month',yymm,1,'b');
  end;
end;
keep custid yymm;
run;
  
Ronein
Meteorite | Level 14

I  added another field with sas date instead of numeric YYMM.

I run your code but I cannot see any result (get empty data set).

The reason that wanted data set is wide is that I want to print the wanted data set and see for each customer one row only

 

data have;
input CustID YYMM   yymm2 : date9.;
format yymm2    ddmmyy10.;
cards;
111 2201 '01JAN2022'd
111 2202 '01FEB2022'd
111 2203 '01MAR2022'd
111 2204 '01APR2022'd
222 2203 '01MAR2022'd
222 2204 '01APR2022'd
222 2206 '01JUN2022'd
222 2207 '01JUL2022'd
222 2208 '01AUG2022'd
222 2209 '01SEP2022'd
222 2210 '01OCT2022'd
;
Run;

data want;
merge have have (firstobs=2rename=(custid=_cust yymm=_yymm));
if _cust = custid then do;
yymm = intnx('month',yymm,1,'b');
do while (yymm<=_yymm);
    output;
    yymm = intnx('month',yymm,1,'b');
  end;
end;
keep custid yymm;
run;
  
Kurt_Bremser
Super User

When you create an extra date variable, you should then also use it in the code in place of the original yymm.

But you do not need two separate variables. There are SAS formats to display dates as year/month only.

And, as you said, you want to later print the data in wide format, do this with a reporting procedure.

Ronein
Meteorite | Level 14

Sorry for that but may you please show the code from start till end because I run the code and it is not working for me

Kurt_Bremser
Super User

See this:

data have;
input CustID YYMM :yymmn6.;
format yymm yymmn6.;
cards;
111 2201
111 2202
111 2203
111 2204
222 2203
222 2204
222 2206
222 2207
222 2208
222 2209
222 2210
333 2206
333 2207
444 2205
444 2207
444 2208
444 2209
444 2210
444 2212
;

data want;
merge
  have
  have (
    firstobs=2
    rename=(custid=_cust yymm=_yymm)
  )
;
retain n 1;
if _cust = custid
then do;
  yymm = intnx('month',yymm,1,'b');
  do while (yymm lt _yymm);
    output;
    yymm = intnx('month',yymm,1,'b');
    n + 1;
  end;
end;
else n = 1;
keep custid yymm n;
run;

proc report data=want;
column custid yymm,n;
define custid / group;
define yymm / "" analysis;
define n / "" across ;
run;

data _null_;
set want;
by custid;
if first.custid
then put custid yymm @@;
else put +(-1)"," yymm @@;
if last.custid then put;
run;

One step creates a report, the other similar output which you can write to a file or an ODS destination.

PaigeMiller
Diamond | Level 26

@Ronein wrote:

Hello

For each customer ID there are multiple rows (Each row represent information in different time).

YYMM  column is year+month in structure YYMM (numeric value).

I want to find for each customer which months (YYMM) are missing between the first month and last month.

For example:

For customer 111

first month is 2201

Last month is 2204

And all months between 2201 and 2204 appear so the new column will get value "No"

 

Customer 222

First month 2203

last month 2210

Month 2205 is missing so new column should get value 2205

 

Customer 444
First month 2205
last month 2212
Months 2206 and 2211  are  missing so new column should get value 2206-2211

What is the way to create the wanted data set?

In the wanted data set each customer will have one row

It seems like you want a new column to say "No" when there are no missing months; it should say "2205" (a character string) for 222 and for 444 you want the character string "2206-2211". What is the benefit of creating missing months as character strings? What would you do with these character strings once you have them? Please explain how these character strings will be used in future analysis or reporting.

 

I say all of this because it seems to me that there are better ways of getting from here to there (better in the sense of easier programming) that we can help you with, if only we knew where you are trying to go. This seems like a textbook example of the XY Problem.

--
Paige Miller
Ksharp
Super User
data have;
input CustID YYMM;
cards;
111 2201
111 2202
111 2203
111 2204
222 2203
222 2204
222 2206
222 2207
222 2208
222 2209
222 2210
333 2206
333 2207
444 2205
444 2207
444 2208
444 2209
444 2210
444 2212
;
Run;
data temp;
 set have;
date=mdy(mod(YYMM,100),1,int(YYMM/100));
format date yymmn4.;
drop YYMM;
run;
proc summary data=temp ;
by CustID;
var date;
output out=temp2(drop=_:) min=min max=max;
run;
data temp3;
 set temp2;
do date=min to max;
 if month ne month(date) then output;
 month=month(date);
end;
format date yymmn4.;
drop min max month;
run;
proc sql;
create table want as
select distinct CustID,put(date,yymmn4. -l) as yymm  from (select * from temp3 except select * from temp)
union
select distinct CustID,'No' from temp where CustID not in (select CustID from (select * from temp3 except select * from temp))
;
quit;
data final_want;
 do until(last.CustID);
  set want;
  by CustID;
 length missing_Months_Vector $ 200;
 missing_Months_Vector=catx(',',missing_Months_Vector,yymm);
 end;
 drop yymm;
run;

sas-innovate-white.png

🚨 Early Bird Rate Extended!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Lock in the best rate now before the price increases on April 1.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1197 views
  • 0 likes
  • 4 in conversation