Obsidian | Level 7

end date imputation

Hi Team,

I need help in programming part for date imputation. Here is the raw data below:

Raw data:

data vy;
input id date \$ 10.;
cards;
101 21aug2020
102 ukfeb2016
103 ukaug2019
104 ukunk2020
105 07aug2018
106 ukdec2020
107 ununkunkk
108 ukfeb2019
;

I need to do date imputation based on two points as below:

1.  If both month and day are missing, then set to December 31.
2. If only day is missing, then set to last day of the month.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: end date imputation

All you really need to do is

If the first character is not 'u' then it's a normal date9. string
else if the third character (where the month begins) is not 'u' then you only need to impute the leading '01'
else if the sixth character (where the year begins) is a number, then impute the leading '01dec'
otherwise the date value is set to missing
&nbsp;
data vy;
input id date \$ 10.;
cards;
101 21aug2020
102 ukfeb2016
103 ukaug2019
104 ukunk2020
105 07aug2018
106 ukdec2020
107 ununkunkk
108 ukfeb2019
;
data want;
set vy;
format datvalue date9.;
if char(date,1) ^= 'u' then datvalue=input(date,date9.);
else if char(date,3)^='u' then datvalue=input('01'||substr(date,3,7),date9.);
else if '0'&lt;=char(date,6)&lt;='9' then datvalue=input('01dec'||substr(date,6),date9.);
run;
&nbsp;
&nbsp;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
5 REPLIES 5
Obsidian | Level 7

Re: end date imputation

data vy1;

set vy;

/*Seperate date into day, month, year */

dayc=substr(date,1,2);

monthc=substr(date,3,3);

yearc=substr(date,6,4);

if yearc ne "unkk" then do; /*One row has year missing, not able to impute */

/*both month and day missing*/

if dayc="uk" and monthc="unk" then do;

dayi="31";

monthi="DEC";

end;

/*only day missing*/

if dayc="uk" and monthc ne"unk" then do;

myrc=cats(monthc)||cats(yearc);

myn=input(myrc, anydtdte7.);

lastday=intnx('month',myn,0,'E');

end;

/*get imputed date*/

if dayi ne "" and monthi ne "" then

date_impc=cats(dayi)||cats(monthi)||cats(yearc);

if lastday ne . then date_impc=put(lastday, date9.);

format date_imp date9.;

date_imp=input(date_impc, date9.);

end;

run;

Super User

Re: end date imputation

If month, day & year are missing( ID 107) does it follow Rule #1?
Obsidian | Level 7

Re: end date imputation

I don’t believe so. ‘if yearc ne “unkk” then do;’ statement removes any year with missing from below logic

Re: end date imputation

All you really need to do is

If the first character is not 'u' then it's a normal date9. string
else if the third character (where the month begins) is not 'u' then you only need to impute the leading '01'
else if the sixth character (where the year begins) is a number, then impute the leading '01dec'
otherwise the date value is set to missing
&nbsp;
data vy;
input id date \$ 10.;
cards;
101 21aug2020
102 ukfeb2016
103 ukaug2019
104 ukunk2020
105 07aug2018
106 ukdec2020
107 ununkunkk
108 ukfeb2019
;
data want;
set vy;
format datvalue date9.;
if char(date,1) ^= 'u' then datvalue=input(date,date9.);
else if char(date,3)^='u' then datvalue=input('01'||substr(date,3,7),date9.);
else if '0'&lt;=char(date,6)&lt;='9' then datvalue=input('01dec'||substr(date,6),date9.);
run;
&nbsp;
&nbsp;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Super User

Re: end date imputation

I already posted an answer to this in the thread that has been deleted in the meantime:

``````data want;
set have (rename=(date=_date));
if substr(_date,6) = "unkk"
then date =.;
else do;
if substr(_date,3,3) = "unk" then substr(_date,3,3) = "dec";
if substr(_date,1,2) = "uk"
then do;
substr(_date,1,2) = "01";
date = input(_date,date9.);
date = intnx('month',date,0,'e');
end;
else date = input(_date,date9.);
end;
format date yymmdd10.; * always use the ISO format for clarity;
drop _date;
run;``````
Discussion stats
• 5 replies
• 225 views
• 2 likes
• 5 in conversation