BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
112211
Obsidian | Level 7

Hi Team,

 

I need help in programming part for date imputation. Here is the raw data below:

 

Raw data:


data vy;
input id date $ 10.;
cards;
101 21aug2020
102 ukfeb2016
103 ukaug2019
104 ukunk2020
105 07aug2018
106 ukdec2020
107 ununkunkk
108 ukfeb2019
;

I need to do date imputation based on two points as below:

    1.  If both month and day are missing, then set to December 31.
     2. If only day is missing, then set to last day of the month.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star
All you really need to do is

If the first character is not 'u' then it's a normal date9. string
else if the third character (where the month begins) is not 'u' then you only need to impute the leading '01'
else if the sixth character (where the year begins) is a number, then impute the leading '01dec'
otherwise the date value is set to missing
 
data vy;
input id date $ 10.;
cards;
101 21aug2020
102 ukfeb2016
103 ukaug2019
104 ukunk2020
105 07aug2018
106 ukdec2020
107 ununkunkk
108 ukfeb2019
;
data want;
set vy;
format datvalue date9.;
if char(date,1) ^= 'u' then datvalue=input(date,date9.);
else if char(date,3)^='u' then datvalue=input('01'||substr(date,3,7),date9.);
else if '0'<=char(date,6)<='9' then datvalue=input('01dec'||substr(date,6),date9.);
run;
 
 
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

5 REPLIES 5
Seadrago
Obsidian | Level 7

data vy1;

  set vy;

/*Seperate date into day, month, year */

  dayc=substr(date,1,2);

  monthc=substr(date,3,3);

  yearc=substr(date,6,4);

 

  if yearc ne "unkk" then do; /*One row has year missing, not able to impute */

  /*both month and day missing*/

  if dayc="uk" and monthc="unk" then do;

     dayi="31";

     monthi="DEC";

  end;

 

  /*only day missing*/

  if dayc="uk" and monthc ne"unk" then do;

     myrc=cats(monthc)||cats(yearc);

     myn=input(myrc, anydtdte7.);

     lastday=intnx('month',myn,0,'E');

  end;

 

   /*get imputed date*/

  if dayi ne "" and monthi ne "" then

  date_impc=cats(dayi)||cats(monthi)||cats(yearc);

  if lastday ne . then date_impc=put(lastday, date9.);

 

  format date_imp date9.;

  date_imp=input(date_impc, date9.);

  end;

run;

Reeza
Super User
If month, day & year are missing( ID 107) does it follow Rule #1?
Seadrago
Obsidian | Level 7
I don’t believe so. ‘if yearc ne “unkk” then do;’ statement removes any year with missing from below logic
mkeintz
PROC Star
All you really need to do is

If the first character is not 'u' then it's a normal date9. string
else if the third character (where the month begins) is not 'u' then you only need to impute the leading '01'
else if the sixth character (where the year begins) is a number, then impute the leading '01dec'
otherwise the date value is set to missing
 
data vy;
input id date $ 10.;
cards;
101 21aug2020
102 ukfeb2016
103 ukaug2019
104 ukunk2020
105 07aug2018
106 ukdec2020
107 ununkunkk
108 ukfeb2019
;
data want;
set vy;
format datvalue date9.;
if char(date,1) ^= 'u' then datvalue=input(date,date9.);
else if char(date,3)^='u' then datvalue=input('01'||substr(date,3,7),date9.);
else if '0'<=char(date,6)<='9' then datvalue=input('01dec'||substr(date,6),date9.);
run;
 
 
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Kurt_Bremser
Super User

I already posted an answer to this in the thread that has been deleted in the meantime:

data want;
set have (rename=(date=_date));
if substr(_date,6) = "unkk"
then date =.;
else do;
  if substr(_date,3,3) = "unk" then substr(_date,3,3) = "dec";
  if substr(_date,1,2) = "uk"
  then do;
    substr(_date,1,2) = "01";
    date = input(_date,date9.);
    date = intnx('month',date,0,'e');
  end;
  else date = input(_date,date9.);
end;
format date yymmdd10.; * always use the ISO format for clarity;
drop _date;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1057 views
  • 2 likes
  • 5 in conversation