Hello All,
I have been cleaning a dataset with a very messy date variable. Having trouble in converting the variable into a uniform date format. Here is what I have:
ID date
1 SEPT01/2001
2 9-Apr-10
3 20-JAN-18
4 Jan 7-2010
5 oct1/2018
6 NOV18 2010
7 FEB. 2, 2016
8 6/1/2010
9 2013/02/01 lost to fup
Many thanks!
Is 6/1/2010 - 6 Jan or 1 Jun?
Is 2013/02/01 - 1 Feb or 2 Jan?
Let me guess this has come from a spreadsheet which will accept any old rubbish...
I guess you will have to go through some "trial and error" process to read the data into SAS as desired.
Below some approach you could take. Any string that you can't convert into a SAS Date value will end up in table investigate. You then need to further add to your code or fix the data directly in your source.
data have;
infile datalines dsd dlm='|' truncover;
input id string $30.;
datalines;
1|SEPT01/2001
2|9-Apr-10
3|20-JAN-18
4|Jan 7-2010
5|oct1/2018
6|NOV18 2010
7|FEB. 2, 2016
8|6/1/2010
9|2013/02/01 lost to fup
;
%let sv_datestye=%sysfunc(getoption(datestyle,,keyexpand));
options datestyle=mdy;
data want investigate;
set have;
length sas_date 8;
format sas_date date9.;
if _n_=0 then _string2=string;
if missing(string) then
do;
output want;
return;
end;
sas_date=input(string, ?? anydtdte.);
if not missing(sas_date) then
do;
output want;
return;
end;
_string2=prxchange('s/^([a-z]+)(\d+)[^\d](\d+)/\2\1\3/oi',-1,strip(string));
sas_date=input(compress(_string2), ?? anydtdte.);
if not missing(sas_date) then
do;
output want;
return;
end;
/*** and here more string manipulations until no rows remain in Investigate ***/
else output investigate;
run;
options &sv_datestye;
Your task is logically impossible.
At least a date like this
20-JAN-18
is ambiguous (could be 2018-01-20 or 2020-01-18)
<RANT>Anybody who still uses 2-digit years is an idiot with terminal brain damage. Such poor creatures have no place near a computer except under adult supervision.</RANT>
Summing up what others already said: 4 of 9 strings can't translated into dates, because the available information is not sufficient to decide which number is day and which is year (IDs 2, 3) or day and month (IDs 8, 9). So either you have the knowledge, so that it can be coded, or the mission will fail.
Hi guys. I really appreciated that you guys are trying to help me with this problem. It's quite messy but it's what I have to deal with. When I posted this question, I didn't realize I provided less information than my dataset had given to me. Anyways, I wrote some code that worked fine for me. For anyone who's interested, that's what I used:
reformat_date = input(compress(date, '.'), anydtdte13.);
reformat_date_2 = input(date, yymmdd10,);
format reformat_date reformat_date_2 date9.;
It didn't convert all dates but it did most of the work. I did other modifications too.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.