Hello
I have a data set with multiple rows per customer Id and each row related to specific date.
I would like to perform 2 tasks:
Task1:
Add rows for dates that are not existing.
Task2:
For each customer need to fill data of empty rows (except of date) with data from previous non-missing row.
data tbl1;
format date ddmmyy10.;
input ID date :date9. Y;
cards;
1 01DEC2023 10
1 04DEC2023 20
1 05DEC2023 30
2 02DEC2023 40
2 04DEC2023 50
2 05DEC2023 60
;
Run;
proc sql noprint;
select min(date) as FROMDate into :FROMDate
from tbl1
;
quit;
proc sql noprint;
select max(date) as TillDate into :TillDate
from tbl1
;
quit;
data dates(Keep=date) ;
date=&FROMDate.;
end_date=&TillDate.;
format date end_date date9.;
do while (date<=end_date);
output;
date=intnx('day', date, 1, 's');
end;
format date ddmmyy10.;
run;
First, I don't see a reason to use macro variables or SQL here. Neither are good solutions for the case where you have to do things row-by-row to get the desired answer. The solution is to write a DATA step, and check for missing dates and react accordingly by looping through the missing dates.
Here is my solution:
data tbl1;
format date ddmmyy10.;
input ID date :date9. Y;
cards;
1 01DEC2023 10
1 04DEC2023 20
1 05DEC2023 30
2 02DEC2023 40
2 04DEC2023 50
2 05DEC2023 60
;
data want;
set tbl1;
previd=lag(id);
prevdate=lag(date);
prevy=lag(y);
/* Check for missing dates */
if prevdate^=(date-1) and id=previd then do;
/* Loop over missing dates */
do iter=1 to (date-prevdate);
newdate=prevdate+iter;
if newdate<date then newy=prevy;
else newy=y;
output;
end;
end;
else do; /* No missing dates found */
newdate=date;
newy=y;
output;
end;
format newdate date9.;
drop prev: iter;
run;
First, I don't see a reason to use macro variables or SQL here. Neither are good solutions for the case where you have to do things row-by-row to get the desired answer. The solution is to write a DATA step, and check for missing dates and react accordingly by looping through the missing dates.
Here is my solution:
data tbl1;
format date ddmmyy10.;
input ID date :date9. Y;
cards;
1 01DEC2023 10
1 04DEC2023 20
1 05DEC2023 30
2 02DEC2023 40
2 04DEC2023 50
2 05DEC2023 60
;
data want;
set tbl1;
previd=lag(id);
prevdate=lag(date);
prevy=lag(y);
/* Check for missing dates */
if prevdate^=(date-1) and id=previd then do;
/* Loop over missing dates */
do iter=1 to (date-prevdate);
newdate=prevdate+iter;
if newdate<date then newy=prevy;
else newy=y;
output;
end;
end;
else do; /* No missing dates found */
newdate=date;
newy=y;
output;
end;
format newdate date9.;
drop prev: iter;
run;
If you have SAS/ETS in your license key.
data tbl1;
format date ddmmyy10.;
input ID date :date9. Y;
cards;
1 01DEC2023 10
1 04DEC2023 20
1 05DEC2023 30
2 02DEC2023 40
2 04DEC2023 50
2 05DEC2023 60
;
Run;
proc expand data=tbl1
out=tbl2
to=day method=step
plots=(input output);
by ID;
id date;
run;
/* end of program */
Koen
Just build up a skeleton or empty dataset with just the ID and DATE values for all of the days you want. (No need to get fancy to increment dates by days. Simple arithmetic will work since dates are stored as number of days.)
data empty;
set have;
by id;
if first.id then start=date;
retain start;
if last.id;
end=date;
do date=start to end;
output;
end;
keep id date;
run;
Then you can combine them and use some simple last observation carried forward logic to replace the missing values.
data want;
merge empty have;
by id date;
if not first.id then y=coalesce(y,oldy);
oldy=y;
retain oldy;
drop oldy;
run;
Result:
OBS ID date Y 1 1 2023-12-01 10 2 1 2023-12-02 10 3 1 2023-12-03 10 4 1 2023-12-04 20 5 1 2023-12-05 30 6 2 2023-12-02 40 7 2 2023-12-03 40 8 2 2023-12-04 50 9 2 2023-12-05 60
PS Don't display dates in either DMY or MDY order. It took me a couple of seconds to figure out why you would want to insert dates in the middle of the months of Feb and March.
If licensed then I'd be using Proc Expand as already proposed. Else here another code variant to create the additional rows.
data have;
format date date9.;
input ID date :date9. Y;
cards;
1 01DEC2023 10
1 04DEC2023 20
1 05DEC2023 30
2 02DEC2023 40
2 04DEC2023 50
2 05DEC2023 60
;
data want;
set have;
by id date;
output;
if not last.id then
do;
/* read ahead: If next date not date+1 create additional rows */
i=_n_+1;
set have(keep=date rename=(date=_date)) point=i;
do i= 1 to (_date-date-1);
date+1;
output;
end;
end;
drop _date;
run;
proc print data=want;
run;
data tbl1;
format date ddmmyy10.;
input ID date :date9. Y;
cards;
1 01DEC2023 10
1 04DEC2023 20
1 05DEC2023 30
2 02DEC2023 40
2 04DEC2023 50
2 05DEC2023 60
;
Run;
data want;
merge tbl1 tbl1(firstobs=2 keep=id date rename=(id=_id date=_date));
output;
if id=_id then do;
do date=date+1 to _date-1;
output;
end;
end;
drop _id _date;
run;
It looks to me like your code (unlike your sample desired output) intends not only to carry-forward Y values through every missing date up through global maximum &TILLDATE, but also your generation of global &FROMDATE implies you may want to create observations with missing values of Y from &FROMDATE up to the date before the first observed date. Otherwise there is no need for &FROMDATE.
If you do NOT need to create dates back to &FROMDATE, then:
proc sql noprint;
select max(date) into :max_date from tbl1;
quit;
data want;
set tbl1 (keep=id);
by id;
merge tbl1
tbl1 (firstobs=2 keep=date rename=(date=_nxt_date));
if last.id then _nxt_date=&max_date+1;
do date=date to _nxt_date-1;
output;
end;
run;
But if you do, then you need an extra conditional process for first.id,:
proc sql noprint;
select min(date), max(date) into :min_date,:max_date
from tbl1;
quit;
data want (drop=_:);
set tbl1 (keep=id date rename=(date=_date));
by id;
if first.id=1 and _date>&min_date then do date= &min_date to _date-1;
output;
end;
merge tbl1
tbl1 (firstobs=2 keep=date rename=(date=_nxt_date));
if last.id=1 then _nxt_date=&max_date+1;
do date=date to _nxt_date-1;
output;
end;
call missing(of _all_);
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.