BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
heretolearnSAS
Calcite | Level 5

I would like to ask how can I code to fill in the missing date between the beginning period and the most recent date available for each customer as illustrated below:

 

assuming below is my data:

DateCustomer IDSalesCredit
Jan-1712345xxxx
Feb-1712345xxxx
Mar-1712345xxxx
Jun-1712345xxxx
Jul-1712345xxxx
Aug-1712345xxxx
Sep-1712345xxxx
Oct-1712345xxxx
Nov-1712345xxxx
Dec-1712345xxxx
Jan-177891xxxx
Feb-177891xxxx
Mar-177891xxxx
Apr-177891xxxx
Jun-177891xxxx
Jul-177891xxxx
Sep-177891xxxx
Oct-177891xxxx

 

I want to generate something like this instead:

DateCustomer IDSalesCredit
Jan-1712345xxxx
Feb-1712345xxxx
Mar-1712345xxxx
Apr-1712345..
May-1712345..
Jun-1712345xxxx
Jul-1712345xxxx
Aug-1712345xxxx
Sep-1712345xxxx
Oct-1712345xxxx
Nov-1712345xxxx
Dec-1712345xxxx
Jan-177891xxxx
Feb-177891xxxx
Mar-177891xxxx
Apr-177891xxxx
May-177891..
Jun-177891xxxx
Jul-177891xxxx
Aug-177891..
Sep-177891xxxx
Oct-177891xxxx

 

As you can see from the data, the beginning date for customer 12345 is starting from Jan 2017 and the most recent date is Dec 2017, however, there are missing dates in between the beginning date and the latest date (April and May 2017). While for customer 7891, the beginning date is starting from Jan 2017 and the most recent date is Oct 2017, but there are also missing dates in between the beginning date and the latest date (May and Aug 2017). Hence, in this case, I just want to fill in the missing dates between their beginning date and their latest date available. In other ways, meaning that I want customer 12345 to have date from Jan 2017 to Dec 2017 while customer 7891 to have date from Jan 2017 to Oct 2017 only, with the missing date's and its data fields imputed with missing value ".".  I tried to use proc expand initially (tested on few customers and it succeed), however, proc expand does not allow for big data (around 40millions), and it will show workspace error/failed. Hence, is there any alternative way to fill in the missing date in between the beginning date and most recent date available for a customer (SAS)? 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
infile cards expandtabs truncover;
input Date : monyy. CustomerID	Sales $ Credit $;
format Date  monyy.;
cards;
Jan-17	12345	xx	xx
Feb-17	12345	xx	xx
Mar-17	12345	xx	xx
Jun-17	12345	xx	xx
Jul-17	12345	xx	xx
Aug-17	12345	xx	xx
Sep-17	12345	xx	xx
Oct-17	12345	xx	xx
Nov-17	12345	xx	xx
Dec-17	12345	xx	xx
Jan-17	7891	xx	xx
Feb-17	7891	xx	xx
Mar-17	7891	xx	xx
Apr-17	7891	xx	xx
Jun-17	7891	xx	xx
Jul-17	7891	xx	xx
Sep-17	7891	xx	xx
Oct-17	7891	xx	xx
;

data want;
 merge have have(firstobs=2 keep=Date CustomerID rename=(Date=_Date CustomerID=_CustomerID));
 output;
 if CustomerID=_CustomerID then do;
  do i=1 to intck('month',Date,_Date)-1;
   call missing(Sales, Credit);
   Date=intnx('month',Date,1);
   output;
  end;
 end;
 drop i _:;
 run;

View solution in original post

4 REPLIES 4
Ksharp
Super User
data have;
infile cards expandtabs truncover;
input Date : monyy. CustomerID	Sales $ Credit $;
format Date  monyy.;
cards;
Jan-17	12345	xx	xx
Feb-17	12345	xx	xx
Mar-17	12345	xx	xx
Jun-17	12345	xx	xx
Jul-17	12345	xx	xx
Aug-17	12345	xx	xx
Sep-17	12345	xx	xx
Oct-17	12345	xx	xx
Nov-17	12345	xx	xx
Dec-17	12345	xx	xx
Jan-17	7891	xx	xx
Feb-17	7891	xx	xx
Mar-17	7891	xx	xx
Apr-17	7891	xx	xx
Jun-17	7891	xx	xx
Jul-17	7891	xx	xx
Sep-17	7891	xx	xx
Oct-17	7891	xx	xx
;

data want;
 merge have have(firstobs=2 keep=Date CustomerID rename=(Date=_Date CustomerID=_CustomerID));
 output;
 if CustomerID=_CustomerID then do;
  do i=1 to intck('month',Date,_Date)-1;
   call missing(Sales, Credit);
   Date=intnx('month',Date,1);
   output;
  end;
 end;
 drop i _:;
 run;
heretolearnSAS
Calcite | Level 5

Hi Ksharp, thanks for your prompt response. If let's say I have 50 data fields (other than Sales and Credit listed in the examples only), how could I further amend the code to call missing of those 50 data fields? is there any efficient way than listing the column name one by one down?

 

let's assume there is column 1 to column 50 (data fields for each customer at each observation point)

 

 

Ksharp
Super User
If these variables are set next to each other , you can use mkeintz suggested:
call missing(of a -- x);

a is the first variable,
x is the last variable,
a -- x represent all the variables between a and x.

Or if your variable name like col1 col2 ..... col100, you could try this:
call missing(of col1-col100);
mkeintz
PROC Star

You want to fill in interior "holes" in a time series.  You can do this with a "lookahead merge":

 

data want (drop=_: nxt_:);
  merge have (keep=id date)
        have (firstobs=2 keep=date rename=(date=nxt_date)) ;

  retain _sentinel1 .;
  set have;
  by id;
  retain _sentinel2 .;
  output;

  if last.id=0 and intck('month',date,nxt_date>1);

  call missing(of _sentinel1 -- _sentinel2);

  do while (intck('month',date,nxt_date)>1);
    date=intnx('month',date,1);
    output;
  end;
run;

The reason for the _sentinel variables is to provide left and right endpoints for all the variables to be reset to missing when filling in the holes.  Instead of listing those variables, you can just use the variable listing syntax  (of sentinel1 -- _sentinel2).  This would be all the variables brought in by the SET statement, EXCEPT variables named in the preceding MERGE statement.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1160 views
  • 0 likes
  • 3 in conversation