BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TheNovice
Quartz | Level 8

I have been given a dataset where the data columns are all blank except for the first record like below:

The dataset is huge so i need to do this efficiently.  I need the output where the blank dates will populate with the first date until the next date starts and so on. The list is already sorted by the dates.

 

I tried the following:

 

data want;
 set test;
 format newdate date9.;
      retain lag_date;
      lag_date = cal_date;
 
     if flag =1 then newdate = lag_date;
     drop lag_date;
run;

 

appreciate any corrections

Cal DateResource Name
01/02/2019B
.B
.B
01/03/2019B

 

to

 

Cal DateResource Name
01/02/2019B
01/02/2019B
01/02/2019B
01/03/2019B
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

@TheNovice wrote:

I have been given a dataset where the data columns are all blank except for the first record like below:

The dataset is huge so i need to do this efficiently.  I need the output where the blank dates will populate with the first date until the next date starts and so on. The list is already sorted by the dates.

 

I tried the following:

 

data want;
 set test;
 format newdate date9.;
      retain lag_date;
      lag_date = cal_date;
 
     if flag =1 then newdate = lag_date;
     drop lag_date;
run;

 

appreciate any corrections

Cal Date Resource Name
01/02/2019 B
. B
. B
01/03/2019 B

 

to

 

Cal Date Resource Name
01/02/2019 B
01/02/2019 B
01/02/2019 B
01/03/2019 B

Retain plus a pair of coalesce functions solves this problem:

 

data have;
  input Cal_Date :mmddyy10.	Resource_Name :$1. ;
  format cal_date date9. ;
datalines;
01/02/2019	B
.	B
.	B
01/03/2019	B
run;

data want (drop=_:);
  set have;
  retain _cd;
  _cd=coalesce(cal_date,_cd);
  cal_date=coalesce(cal_date,_cd);
run;

The retain _cd is the way to carry a value to future observations.

 

The coalesce function says to take the left-most non-missing value.  So the "trick" is to put the cal_date value into the retained _CD variable if cal_date is not missing.   Then cal_date keeps its current value only if it is non-missing, otherwise the (carried forward) non-missing _CD.

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
mkeintz
PROC Star

@TheNovice wrote:

I have been given a dataset where the data columns are all blank except for the first record like below:

The dataset is huge so i need to do this efficiently.  I need the output where the blank dates will populate with the first date until the next date starts and so on. The list is already sorted by the dates.

 

I tried the following:

 

data want;
 set test;
 format newdate date9.;
      retain lag_date;
      lag_date = cal_date;
 
     if flag =1 then newdate = lag_date;
     drop lag_date;
run;

 

appreciate any corrections

Cal Date Resource Name
01/02/2019 B
. B
. B
01/03/2019 B

 

to

 

Cal Date Resource Name
01/02/2019 B
01/02/2019 B
01/02/2019 B
01/03/2019 B

Retain plus a pair of coalesce functions solves this problem:

 

data have;
  input Cal_Date :mmddyy10.	Resource_Name :$1. ;
  format cal_date date9. ;
datalines;
01/02/2019	B
.	B
.	B
01/03/2019	B
run;

data want (drop=_:);
  set have;
  retain _cd;
  _cd=coalesce(cal_date,_cd);
  cal_date=coalesce(cal_date,_cd);
run;

The retain _cd is the way to carry a value to future observations.

 

The coalesce function says to take the left-most non-missing value.  So the "trick" is to put the cal_date value into the retained _CD variable if cal_date is not missing.   Then cal_date keeps its current value only if it is non-missing, otherwise the (carried forward) non-missing _CD.

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
TheNovice
Quartz | Level 8
Thank you so much for the explanation. this solution worked perfectly as well 🙂

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 691 views
  • 0 likes
  • 2 in conversation