Solved: Populating blank cells with previous date

TheNovice · Posted 09-04-2020 01:01 PM

I have been given a dataset where the data columns are all blank except for the first record like below:

The dataset is huge so i need to do this efficiently. I need the output where the blank dates will populate with the first date until the next date starts and so on. The list is already sorted by the dates.

I tried the following:

data want;
set test;
format newdate date9.;
      retain lag_date;
      lag_date = cal_date;

     if flag =1 then newdate = lag_date;
     drop lag_date;
run;

appreciate any corrections

Cal Date	Resource Name
01/02/2019	B
.	B
.	B
01/03/2019	B

to

Cal Date	Resource Name
01/02/2019	B
01/02/2019	B
01/02/2019	B
01/03/2019	B

mkeintz · Posted 09-04-2020 04:47 PM

@TheNovice wrote:

I have been given a dataset where the data columns are all blank except for the first record like below:

The dataset is huge so i need to do this efficiently. I need the output where the blank dates will populate with the first date until the next date starts and so on. The list is already sorted by the dates.

I tried the following:

data want;
set test;
format newdate date9.;
      retain lag_date;
      lag_date = cal_date;

     if flag =1 then newdate = lag_date;
     drop lag_date;
run;

appreciate any corrections

Cal Date Resource Name

01/02/2019 B

. B

. B

01/03/2019 B

to

Cal Date Resource Name

01/02/2019 B

01/02/2019 B

01/02/2019 B

01/03/2019 B

Retain plus a pair of coalesce functions solves this problem:

data have;
  input Cal_Date :mmddyy10.	Resource_Name :$1. ;
  format cal_date date9. ;
datalines;
01/02/2019	B
.	B
.	B
01/03/2019	B
run;

data want (drop=_:);
  set have;
  retain _cd;
  _cd=coalesce(cal_date,_cd);
  cal_date=coalesce(cal_date,_cd);
run;

The retain _cd is the way to carry a value to future observations.

The coalesce function says to take the left-most non-missing value. So the "trick" is to put the cal_date value into the retained _CD variable if cal_date is not missing. Then cal_date keeps its current value only if it is non-missing, otherwise the (carried forward) non-missing _CD.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 09-04-2020 04:47 PM

@TheNovice wrote:

I have been given a dataset where the data columns are all blank except for the first record like below:

The dataset is huge so i need to do this efficiently. I need the output where the blank dates will populate with the first date until the next date starts and so on. The list is already sorted by the dates.

I tried the following:

data want;
set test;
format newdate date9.;
      retain lag_date;
      lag_date = cal_date;

     if flag =1 then newdate = lag_date;
     drop lag_date;
run;

appreciate any corrections

Cal Date Resource Name

01/02/2019 B

. B

. B

01/03/2019 B

to

Cal Date Resource Name

01/02/2019 B

01/02/2019 B

01/02/2019 B

01/03/2019 B

Retain plus a pair of coalesce functions solves this problem:

data have;
  input Cal_Date :mmddyy10.	Resource_Name :$1. ;
  format cal_date date9. ;
datalines;
01/02/2019	B
.	B
.	B
01/03/2019	B
run;

data want (drop=_:);
  set have;
  retain _cd;
  _cd=coalesce(cal_date,_cd);
  cal_date=coalesce(cal_date,_cd);
run;

The retain _cd is the way to carry a value to future observations.

The coalesce function says to take the left-most non-missing value. So the "trick" is to put the cal_date value into the retained _CD variable if cal_date is not missing. Then cal_date keeps its current value only if it is non-missing, otherwise the (carried forward) non-missing _CD.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

TheNovice · Posted 09-08-2020 07:44 AM

Thank you so much for the explanation. this solution worked perfectly as well 🙂

Populating blank cells with previous date

Re: Populating blank cells with previous date

Re: Populating blank cells with previous date

Re: Populating blank cells with previous date

Populating blank cells with previous date

Re: Populating blank cells with previous date

Re: Populating blank cells with previous date

Re: Populating blank cells with previous date

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away