Greetings,
I have some missing data that I need to fill with the latest available row, IF the ID and the dates are the same. My example is as follows:
This is what I have:
ID | Date | Amount |
123456 | 06/30/2017 | 102.55 |
123456 | 07/31/2017 | 102.55 |
123456 | 07/31/2017 | . |
123456 | 08/31/2017 | . |
123456 | 09/30/2017 | . |
123456 | 10/31/2017 | . |
95421 | 01/31/2018 | 44 |
95421 | 02/28/2018 | 44 |
95421 | 03/31/2018 | 44 |
95421 | 03/31/2018 | . |
95421 | 04/30/2018 | . |
95421 | 05/31/2018 | . |
This is what I need:
ID | Date | Amount |
123456 | 06/30/2017 | 102.55 |
123456 | 07/31/2017 | 102.55 |
123456 | 07/31/2017 | 102.55 |
123456 | 08/31/2017 | 102.55 |
123456 | 09/30/2017 | 102.55 |
123456 | 10/31/2017 | 102.55 |
95421 | 01/31/2018 | 44 |
95421 | 02/28/2018 | 44 |
95421 | 03/31/2018 | 44 |
95421 | 03/31/2018 | 44 |
95421 | 04/30/2018 | 44 |
95421 | 05/31/2018 | 44 |
Thanks
data have;
input ID Date :mmddyy10. Amount;
cards;
123456 06/30/2017 102.55
123456 07/31/2017 102.55
123456 07/31/2017 .
123456 08/31/2017 .
123456 09/30/2017 .
123456 10/31/2017 .
95421 01/31/2018 44
95421 02/28/2018 44
95421 03/31/2018 44
95421 03/31/2018 .
95421 04/30/2018 .
95421 05/31/2018 .
;
data want;
set have;
by id notsorted;
retain _a;
if first.id then call missing(_a);
if not missing(amount) then _a=amount;
else amount=_a;
drop _a;
run;
data have;
input ID Date :mmddyy10. Amount;
cards;
123456 06/30/2017 102.55
123456 07/31/2017 102.55
123456 07/31/2017 .
123456 08/31/2017 .
123456 09/30/2017 .
123456 10/31/2017 .
95421 01/31/2018 44
95421 02/28/2018 44
95421 03/31/2018 44
95421 03/31/2018 .
95421 04/30/2018 .
95421 05/31/2018 .
;
data want;
set have;
by id notsorted;
retain _a;
if first.id then call missing(_a);
if not missing(amount) then _a=amount;
else amount=_a;
drop _a;
run;
or
proc sort data=have out=_have;
by id;
run;
data want;
update _have(obs=0) _have;
by id ;
output;
run;
Thanks! My data has several columns, but I am just posting the relevant ones.
Ok, just take the 1st code. I don't see any problem with that because we are only playing with amount column. I believe that's what your question is right? to fill in missing values for amount column right?
That makes a big difference in choosing the right solution.
The DATA step RETAIN solution requires you to type in the specific variable names that you want to carry over.
The UPDATE doesn't even require that you know the variable names. However, it carries over ALL variables (even the ones that you didn't mention because they are irrelevant here).
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.