I have been given a dataset where the data columns are all blank except for the first record like below:
The dataset is huge so i need to do this efficiently. I need the output where the blank dates will populate with the first date until the next date starts and so on. The list is already sorted by the dates.
I tried the following:
data want;
set test;
format newdate date9.;
retain lag_date;
lag_date = cal_date;
if flag =1 then newdate = lag_date;
drop lag_date;
run;
appreciate any corrections
Cal Date | Resource Name |
01/02/2019 | B |
. | B |
. | B |
01/03/2019 | B |
to
Cal Date | Resource Name |
01/02/2019 | B |
01/02/2019 | B |
01/02/2019 | B |
01/03/2019 | B |
@TheNovice wrote:
I have been given a dataset where the data columns are all blank except for the first record like below:
The dataset is huge so i need to do this efficiently. I need the output where the blank dates will populate with the first date until the next date starts and so on. The list is already sorted by the dates.
I tried the following:
data want;
set test;
format newdate date9.;
retain lag_date;
lag_date = cal_date;
if flag =1 then newdate = lag_date;
drop lag_date;
run;
appreciate any corrections
Cal Date Resource Name 01/02/2019 B . B . B 01/03/2019 B
to
Cal Date Resource Name 01/02/2019 B 01/02/2019 B 01/02/2019 B 01/03/2019 B
Retain plus a pair of coalesce functions solves this problem:
data have;
input Cal_Date :mmddyy10. Resource_Name :$1. ;
format cal_date date9. ;
datalines;
01/02/2019 B
. B
. B
01/03/2019 B
run;
data want (drop=_:);
set have;
retain _cd;
_cd=coalesce(cal_date,_cd);
cal_date=coalesce(cal_date,_cd);
run;
The retain _cd is the way to carry a value to future observations.
The coalesce function says to take the left-most non-missing value. So the "trick" is to put the cal_date value into the retained _CD variable if cal_date is not missing. Then cal_date keeps its current value only if it is non-missing, otherwise the (carried forward) non-missing _CD.
@TheNovice wrote:
I have been given a dataset where the data columns are all blank except for the first record like below:
The dataset is huge so i need to do this efficiently. I need the output where the blank dates will populate with the first date until the next date starts and so on. The list is already sorted by the dates.
I tried the following:
data want;
set test;
format newdate date9.;
retain lag_date;
lag_date = cal_date;
if flag =1 then newdate = lag_date;
drop lag_date;
run;
appreciate any corrections
Cal Date Resource Name 01/02/2019 B . B . B 01/03/2019 B
to
Cal Date Resource Name 01/02/2019 B 01/02/2019 B 01/02/2019 B 01/03/2019 B
Retain plus a pair of coalesce functions solves this problem:
data have;
input Cal_Date :mmddyy10. Resource_Name :$1. ;
format cal_date date9. ;
datalines;
01/02/2019 B
. B
. B
01/03/2019 B
run;
data want (drop=_:);
set have;
retain _cd;
_cd=coalesce(cal_date,_cd);
cal_date=coalesce(cal_date,_cd);
run;
The retain _cd is the way to carry a value to future observations.
The coalesce function says to take the left-most non-missing value. So the "trick" is to put the cal_date value into the retained _CD variable if cal_date is not missing. Then cal_date keeps its current value only if it is non-missing, otherwise the (carried forward) non-missing _CD.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.