Hi Team,
I need help in programming part for date imputation. Here is the raw data below:
Raw data:
data vy;
input id date $ 10.;
cards;
101 21aug2020
102 ukfeb2016
103 ukaug2019
104 ukunk2020
105 07aug2018
106 ukdec2020
107 ununkunkk
108 ukfeb2019
;
I need to do date imputation based on two points as below:
1. If both month and day are missing, then set to December 31.
2. If only day is missing, then set to last day of the month.
data vy1;
set vy;
/*Seperate date into day, month, year */
dayc=substr(date,1,2);
monthc=substr(date,3,3);
yearc=substr(date,6,4);
if yearc ne "unkk" then do; /*One row has year missing, not able to impute */
/*both month and day missing*/
if dayc="uk" and monthc="unk" then do;
dayi="31";
monthi="DEC";
end;
/*only day missing*/
if dayc="uk" and monthc ne"unk" then do;
myrc=cats(monthc)||cats(yearc);
myn=input(myrc, anydtdte7.);
lastday=intnx('month',myn,0,'E');
end;
/*get imputed date*/
if dayi ne "" and monthi ne "" then
date_impc=cats(dayi)||cats(monthi)||cats(yearc);
if lastday ne . then date_impc=put(lastday, date9.);
format date_imp date9.;
date_imp=input(date_impc, date9.);
end;
run;
I already posted an answer to this in the thread that has been deleted in the meantime:
data want;
set have (rename=(date=_date));
if substr(_date,6) = "unkk"
then date =.;
else do;
if substr(_date,3,3) = "unk" then substr(_date,3,3) = "dec";
if substr(_date,1,2) = "uk"
then do;
substr(_date,1,2) = "01";
date = input(_date,date9.);
date = intnx('month',date,0,'e');
end;
else date = input(_date,date9.);
end;
format date yymmdd10.; * always use the ISO format for clarity;
drop _date;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.