I have been given a dataset where the data columns are all blank except for the first record like below:
The dataset is huge so i need to do this efficiently. I need the output where the blank dates will populate with the first date until the next date starts and so on. The list is already sorted by the dates.
I tried the following:
data want;
set test;
format newdate date9.;
retain lag_date;
lag_date = cal_date;
if flag =1 then newdate = lag_date;
drop lag_date;
run;
appreciate any corrections
| Cal Date | Resource Name |
| 01/02/2019 | B |
| . | B |
| . | B |
| 01/03/2019 | B |
to
| Cal Date | Resource Name |
| 01/02/2019 | B |
| 01/02/2019 | B |
| 01/02/2019 | B |
| 01/03/2019 | B |
@TheNovice wrote:
I have been given a dataset where the data columns are all blank except for the first record like below:
The dataset is huge so i need to do this efficiently. I need the output where the blank dates will populate with the first date until the next date starts and so on. The list is already sorted by the dates.
I tried the following:
data want;
set test;
format newdate date9.;
retain lag_date;
lag_date = cal_date;
if flag =1 then newdate = lag_date;
drop lag_date;
run;
appreciate any corrections
Cal Date Resource Name 01/02/2019 B . B . B 01/03/2019 B
to
Cal Date Resource Name 01/02/2019 B 01/02/2019 B 01/02/2019 B 01/03/2019 B
Retain plus a pair of coalesce functions solves this problem:
data have;
input Cal_Date :mmddyy10. Resource_Name :$1. ;
format cal_date date9. ;
datalines;
01/02/2019 B
. B
. B
01/03/2019 B
run;
data want (drop=_:);
set have;
retain _cd;
_cd=coalesce(cal_date,_cd);
cal_date=coalesce(cal_date,_cd);
run;
The retain _cd is the way to carry a value to future observations.
The coalesce function says to take the left-most non-missing value. So the "trick" is to put the cal_date value into the retained _CD variable if cal_date is not missing. Then cal_date keeps its current value only if it is non-missing, otherwise the (carried forward) non-missing _CD.
@TheNovice wrote:
I have been given a dataset where the data columns are all blank except for the first record like below:
The dataset is huge so i need to do this efficiently. I need the output where the blank dates will populate with the first date until the next date starts and so on. The list is already sorted by the dates.
I tried the following:
data want;
set test;
format newdate date9.;
retain lag_date;
lag_date = cal_date;
if flag =1 then newdate = lag_date;
drop lag_date;
run;
appreciate any corrections
Cal Date Resource Name 01/02/2019 B . B . B 01/03/2019 B
to
Cal Date Resource Name 01/02/2019 B 01/02/2019 B 01/02/2019 B 01/03/2019 B
Retain plus a pair of coalesce functions solves this problem:
data have;
input Cal_Date :mmddyy10. Resource_Name :$1. ;
format cal_date date9. ;
datalines;
01/02/2019 B
. B
. B
01/03/2019 B
run;
data want (drop=_:);
set have;
retain _cd;
_cd=coalesce(cal_date,_cd);
cal_date=coalesce(cal_date,_cd);
run;
The retain _cd is the way to carry a value to future observations.
The coalesce function says to take the left-most non-missing value. So the "trick" is to put the cal_date value into the retained _CD variable if cal_date is not missing. Then cal_date keeps its current value only if it is non-missing, otherwise the (carried forward) non-missing _CD.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.