Hi all! I know there's a thread on working with incomplete character dates but my situation is slightly different.
I have a date of diagnosis variable formatted as YYYYMMDD, and it's a character variable (length=8). I would like to change it to a SAS date variable; however some dates contain missing values. For example, the DD portion will be coded as "99" if the day of diagnosis is unknown. So my data look like:
ID Date of Diagnosis
1 20000314
2 20120523
3 19981299
4 19930118
5 20090899
6 19990799
*fictitious date values for exemplary purposes; my data contain over thousands of people*
This date conversion code would return the dates 3, 5 and 6 as missing (".")
date_new=input(date_old, yymmdd8.);
format date_new yymmdd8.;
Is there a way I can tell SAS to only read the first 6 characters of the variable (YYYYMM portion) and convert it to a SAS date formatted as MONYY. or YYMON.?
Alternatively, before converting, is there a way I can tell SAS to replace any YYYYMM99 dates for the day of diagnosis variable with YYYYMM15 (set to the middle of that month)? I am asking this second question because there will be scenarios where MM(month) is 99 too (i.e. 20059914) and if I can learn how to deal with the DD(day) portion then I can apply it to missing month dates.
I am using SAS Enterprise 7.1.
Thank you for your help!
if substr(date_old,7,2) = '99' then substr(date_old,7,2) = '15';
date_new = input(date_old,yymmdd8.);
if substr(date_old,7,2) = '99' then substr(date_old,7,2) = '15';
date_new = input(date_old,yymmdd8.);
Lots of flexibility with regular expressions:
data test;
input ID DateStr $;
datalines;
1 20000314
2 20120523
3 19981299
4 19930118
5 20090899
6 19990799
7 19999907
;
data want;
set test;
dateStr = prxchange("s/(\d{6})99/\1\Q15/",1,dateStr);
dateStr = prxchange("s/(\d{4})99\d\d/\1\Q0701/",1,dateStr);
date = input(dateStr, yymmdd8.);
format date yymmdd10.;
run;
proc print; run;
Can I ask how you would write the syntax if there are multiple date variables please?
Thanks
S
Just repeat the 4 lines of code for every variable. Or use arrays and a do loop if that's more convenient.
You can't set a format conditionally, or to put it another way: you cannot show different formats for the same variable in different rows.
I can see two solutions. You can either repair the date, and set the day to 15 if it is out of range:
Data want;
set have;
date_new=input(date_old,?? yymmdd8.); /* ?? Means do not raise a log error */
if missing(date_new) then do;
date_new=input(substr(date_old,1,6)!!'15',yymmdd8.);
date_corrected=1;
end;
else
date_corrected=0;
run;
Or you can have two variables, one that shows the date (when valid), and one that shows the month (which is presumably always valid):
data want;
set have;
date_new=input(date_old,?? yymmdd8.);
month_date=input(date_old,yymmn6.);
format month_date yymon7.;
run;
The month_date will then always be the first of the month, if date_new is missing it means that the day was invalid.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.