BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TheNovice
Quartz | Level 8

I have been given a dataset where the data columns are all blank except for the first record like below:

The dataset is huge so i need to do this efficiently.  I need the output where the blank dates will populate with the first date until the next date starts and so on. The list is already sorted by the dates.

 

I tried the following:

 

data want;
 set test;
 format newdate date9.;
      retain lag_date;
      lag_date = cal_date;
 
     if flag =1 then newdate = lag_date;
     drop lag_date;
run;

 

appreciate any corrections

Cal DateResource Name
01/02/2019B
.B
.B
01/03/2019B

 

to

 

Cal DateResource Name
01/02/2019B
01/02/2019B
01/02/2019B
01/03/2019B
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

@TheNovice wrote:

I have been given a dataset where the data columns are all blank except for the first record like below:

The dataset is huge so i need to do this efficiently.  I need the output where the blank dates will populate with the first date until the next date starts and so on. The list is already sorted by the dates.

 

I tried the following:

 

data want;
 set test;
 format newdate date9.;
      retain lag_date;
      lag_date = cal_date;
 
     if flag =1 then newdate = lag_date;
     drop lag_date;
run;

 

appreciate any corrections

Cal Date Resource Name
01/02/2019 B
. B
. B
01/03/2019 B

 

to

 

Cal Date Resource Name
01/02/2019 B
01/02/2019 B
01/02/2019 B
01/03/2019 B

Retain plus a pair of coalesce functions solves this problem:

 

data have;
  input Cal_Date :mmddyy10.	Resource_Name :$1. ;
  format cal_date date9. ;
datalines;
01/02/2019	B
.	B
.	B
01/03/2019	B
run;

data want (drop=_:);
  set have;
  retain _cd;
  _cd=coalesce(cal_date,_cd);
  cal_date=coalesce(cal_date,_cd);
run;

The retain _cd is the way to carry a value to future observations.

 

The coalesce function says to take the left-most non-missing value.  So the "trick" is to put the cal_date value into the retained _CD variable if cal_date is not missing.   Then cal_date keeps its current value only if it is non-missing, otherwise the (carried forward) non-missing _CD.

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
mkeintz
PROC Star

@TheNovice wrote:

I have been given a dataset where the data columns are all blank except for the first record like below:

The dataset is huge so i need to do this efficiently.  I need the output where the blank dates will populate with the first date until the next date starts and so on. The list is already sorted by the dates.

 

I tried the following:

 

data want;
 set test;
 format newdate date9.;
      retain lag_date;
      lag_date = cal_date;
 
     if flag =1 then newdate = lag_date;
     drop lag_date;
run;

 

appreciate any corrections

Cal Date Resource Name
01/02/2019 B
. B
. B
01/03/2019 B

 

to

 

Cal Date Resource Name
01/02/2019 B
01/02/2019 B
01/02/2019 B
01/03/2019 B

Retain plus a pair of coalesce functions solves this problem:

 

data have;
  input Cal_Date :mmddyy10.	Resource_Name :$1. ;
  format cal_date date9. ;
datalines;
01/02/2019	B
.	B
.	B
01/03/2019	B
run;

data want (drop=_:);
  set have;
  retain _cd;
  _cd=coalesce(cal_date,_cd);
  cal_date=coalesce(cal_date,_cd);
run;

The retain _cd is the way to carry a value to future observations.

 

The coalesce function says to take the left-most non-missing value.  So the "trick" is to put the cal_date value into the retained _CD variable if cal_date is not missing.   Then cal_date keeps its current value only if it is non-missing, otherwise the (carried forward) non-missing _CD.

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
TheNovice
Quartz | Level 8
Thank you so much for the explanation. this solution worked perfectly as well 🙂

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 795 views
  • 0 likes
  • 2 in conversation