I have been given a dataset where the data columns are all blank except for the first record like below:
The dataset is huge so i need to do this efficiently. I need the output where the blank dates will populate with the first date until the next date starts and so on. The list is already sorted by the dates.
I tried the following:
data want;
 set test;
 format newdate date9.;
      retain lag_date;
      lag_date = cal_date;
 
     if flag =1 then newdate = lag_date;
     drop lag_date;
run;
appreciate any corrections
| Cal Date | Resource Name | 
| 01/02/2019 | B | 
| . | B | 
| . | B | 
| 01/03/2019 | B | 
to
| Cal Date | Resource Name | 
| 01/02/2019 | B | 
| 01/02/2019 | B | 
| 01/02/2019 | B | 
| 01/03/2019 | B | 
@TheNovice wrote:
I have been given a dataset where the data columns are all blank except for the first record like below:
The dataset is huge so i need to do this efficiently. I need the output where the blank dates will populate with the first date until the next date starts and so on. The list is already sorted by the dates.
I tried the following:
data want;
set test;
format newdate date9.;
retain lag_date;
lag_date = cal_date;
if flag =1 then newdate = lag_date;
drop lag_date;
run;
appreciate any corrections
Cal Date Resource Name 01/02/2019 B . B . B 01/03/2019 B 
to
Cal Date Resource Name 01/02/2019 B 01/02/2019 B 01/02/2019 B 01/03/2019 B 
Retain plus a pair of coalesce functions solves this problem:
data have;
  input Cal_Date :mmddyy10.	Resource_Name :$1. ;
  format cal_date date9. ;
datalines;
01/02/2019	B
.	B
.	B
01/03/2019	B
run;
data want (drop=_:);
  set have;
  retain _cd;
  _cd=coalesce(cal_date,_cd);
  cal_date=coalesce(cal_date,_cd);
run;
The retain _cd is the way to carry a value to future observations.
The coalesce function says to take the left-most non-missing value. So the "trick" is to put the cal_date value into the retained _CD variable if cal_date is not missing. Then cal_date keeps its current value only if it is non-missing, otherwise the (carried forward) non-missing _CD.
@TheNovice wrote:
I have been given a dataset where the data columns are all blank except for the first record like below:
The dataset is huge so i need to do this efficiently. I need the output where the blank dates will populate with the first date until the next date starts and so on. The list is already sorted by the dates.
I tried the following:
data want;
set test;
format newdate date9.;
retain lag_date;
lag_date = cal_date;
if flag =1 then newdate = lag_date;
drop lag_date;
run;
appreciate any corrections
Cal Date Resource Name 01/02/2019 B . B . B 01/03/2019 B 
to
Cal Date Resource Name 01/02/2019 B 01/02/2019 B 01/02/2019 B 01/03/2019 B 
Retain plus a pair of coalesce functions solves this problem:
data have;
  input Cal_Date :mmddyy10.	Resource_Name :$1. ;
  format cal_date date9. ;
datalines;
01/02/2019	B
.	B
.	B
01/03/2019	B
run;
data want (drop=_:);
  set have;
  retain _cd;
  _cd=coalesce(cal_date,_cd);
  cal_date=coalesce(cal_date,_cd);
run;
The retain _cd is the way to carry a value to future observations.
The coalesce function says to take the left-most non-missing value. So the "trick" is to put the cal_date value into the retained _CD variable if cal_date is not missing. Then cal_date keeps its current value only if it is non-missing, otherwise the (carried forward) non-missing _CD.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.