Solved: How to fill in a missing date for different customer

heretolearnSAS · Posted 06-01-2022 12:16 AM

I would like to ask how can I code to fill in the missing date between the beginning period and the most recent date available for each customer as illustrated below:

assuming below is my data:

Date	Customer ID	Sales	Credit
Jan-17	12345	xx	xx
Feb-17	12345	xx	xx
Mar-17	12345	xx	xx
Jun-17	12345	xx	xx
Jul-17	12345	xx	xx
Aug-17	12345	xx	xx
Sep-17	12345	xx	xx
Oct-17	12345	xx	xx
Nov-17	12345	xx	xx
Dec-17	12345	xx	xx
Jan-17	7891	xx	xx
Feb-17	7891	xx	xx
Mar-17	7891	xx	xx
Apr-17	7891	xx	xx
Jun-17	7891	xx	xx
Jul-17	7891	xx	xx
Sep-17	7891	xx	xx
Oct-17	7891	xx	xx

I want to generate something like this instead:

Date	Customer ID	Sales	Credit
Jan-17	12345	xx	xx
Feb-17	12345	xx	xx
Mar-17	12345	xx	xx
Apr-17	12345	.	.
May-17	12345	.	.
Jun-17	12345	xx	xx
Jul-17	12345	xx	xx
Aug-17	12345	xx	xx
Sep-17	12345	xx	xx
Oct-17	12345	xx	xx
Nov-17	12345	xx	xx
Dec-17	12345	xx	xx
Jan-17	7891	xx	xx
Feb-17	7891	xx	xx
Mar-17	7891	xx	xx
Apr-17	7891	xx	xx
May-17	7891	.	.
Jun-17	7891	xx	xx
Jul-17	7891	xx	xx
Aug-17	7891	.	.
Sep-17	7891	xx	xx
Oct-17	7891	xx	xx

As you can see from the data, the beginning date for customer 12345 is starting from Jan 2017 and the most recent date is Dec 2017, however, there are missing dates in between the beginning date and the latest date (April and May 2017). While for customer 7891, the beginning date is starting from Jan 2017 and the most recent date is Oct 2017, but there are also missing dates in between the beginning date and the latest date (May and Aug 2017). Hence, in this case, I just want to fill in the missing dates between their beginning date and their latest date available. In other ways, meaning that I want customer 12345 to have date from Jan 2017 to Dec 2017 while customer 7891 to have date from Jan 2017 to Oct 2017 only, with the missing date's and its data fields imputed with missing value ".". I tried to use proc expand initially (tested on few customers and it succeed), however, proc expand does not allow for big data (around 40millions), and it will show workspace error/failed. Hence, is there any alternative way to fill in the missing date in between the beginning date and most recent date available for a customer (SAS)?

Ksharp · Posted 06-01-2022 01:16 AM

data have;
infile cards expandtabs truncover;
input Date : monyy. CustomerID	Sales $ Credit $;
format Date  monyy.;
cards;
Jan-17	12345	xx	xx
Feb-17	12345	xx	xx
Mar-17	12345	xx	xx
Jun-17	12345	xx	xx
Jul-17	12345	xx	xx
Aug-17	12345	xx	xx
Sep-17	12345	xx	xx
Oct-17	12345	xx	xx
Nov-17	12345	xx	xx
Dec-17	12345	xx	xx
Jan-17	7891	xx	xx
Feb-17	7891	xx	xx
Mar-17	7891	xx	xx
Apr-17	7891	xx	xx
Jun-17	7891	xx	xx
Jul-17	7891	xx	xx
Sep-17	7891	xx	xx
Oct-17	7891	xx	xx
;

data want;
 merge have have(firstobs=2 keep=Date CustomerID rename=(Date=_Date CustomerID=_CustomerID));
 output;
 if CustomerID=_CustomerID then do;
  do i=1 to intck('month',Date,_Date)-1;
   call missing(Sales, Credit);
   Date=intnx('month',Date,1);
   output;
  end;
 end;
 drop i _:;
 run;

View solution in original post

Ksharp · Posted 06-01-2022 01:16 AM

data have;
infile cards expandtabs truncover;
input Date : monyy. CustomerID	Sales $ Credit $;
format Date  monyy.;
cards;
Jan-17	12345	xx	xx
Feb-17	12345	xx	xx
Mar-17	12345	xx	xx
Jun-17	12345	xx	xx
Jul-17	12345	xx	xx
Aug-17	12345	xx	xx
Sep-17	12345	xx	xx
Oct-17	12345	xx	xx
Nov-17	12345	xx	xx
Dec-17	12345	xx	xx
Jan-17	7891	xx	xx
Feb-17	7891	xx	xx
Mar-17	7891	xx	xx
Apr-17	7891	xx	xx
Jun-17	7891	xx	xx
Jul-17	7891	xx	xx
Sep-17	7891	xx	xx
Oct-17	7891	xx	xx
;

data want;
 merge have have(firstobs=2 keep=Date CustomerID rename=(Date=_Date CustomerID=_CustomerID));
 output;
 if CustomerID=_CustomerID then do;
  do i=1 to intck('month',Date,_Date)-1;
   call missing(Sales, Credit);
   Date=intnx('month',Date,1);
   output;
  end;
 end;
 drop i _:;
 run;

heretolearnSAS · Posted 06-01-2022 03:06 AM

Hi Ksharp, thanks for your prompt response. If let's say I have 50 data fields (other than Sales and Credit listed in the examples only), how could I further amend the code to call missing of those 50 data fields? is there any efficient way than listing the column name one by one down?

let's assume there is column 1 to column 50 (data fields for each customer at each observation point)

Ksharp · Posted 06-01-2022 04:02 AM

If these variables are set next to each other , you can use mkeintz suggested:
call missing(of a -- x);

a is the first variable,
x is the last variable,
a -- x represent all the variables between a and x.

Or if your variable name like col1 col2 ..... col100, you could try this:
call missing(of col1-col100);

mkeintz · Posted 06-01-2022 01:30 AM

You want to fill in interior "holes" in a time series. You can do this with a "lookahead merge":

data want (drop=_: nxt_:);
  merge have (keep=id date)
        have (firstobs=2 keep=date rename=(date=nxt_date)) ;

  retain _sentinel1 .;
  set have;
  by id;
  retain _sentinel2 .;
  output;

  if last.id=0 and intck('month',date,nxt_date>1);

  call missing(of _sentinel1 -- _sentinel2);

  do while (intck('month',date,nxt_date)>1);
    date=intnx('month',date,1);
    output;
  end;
run;

The reason for the _sentinel variables is to provide left and right endpoints for all the variables to be reset to missing when filling in the holes. Instead of listing those variables, you can just use the variable listing syntax (of sentinel1 -- _sentinel2). This would be all the variables brought in by the SET statement, EXCEPT variables named in the preceding MERGE statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

How to fill in a missing date for different customer

Re: How to fill in a missing date for different customer

Re: How to fill in a missing date for different customer

Re: How to fill in a missing date for different customer

Re: How to fill in a missing date for different customer

Re: How to fill in a missing date for different customer

Catch up on SAS Innovate 2026

How to fill in a missing date for different customer

Re: How to fill in a missing date for different customer

Re: How to fill in a missing date for different customer

Re: How to fill in a missing date for different customer

Re: How to fill in a missing date for different customer

Re: How to fill in a missing date for different customer

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away