BookmarkSubscribeRSS Feed
di_niu0
Obsidian | Level 7

Hello All, 

 

I have been cleaning a dataset with a very messy date variable. Having trouble in converting the variable into a uniform date format. Here is what I have:

 

ID      date

1       SEPT01/2001

2       9-Apr-10

3       20-JAN-18

4       Jan 7-2010

5       oct1/2018

6       NOV18 2010

7       FEB. 2, 2016

8       6/1/2010

9       2013/02/01 lost to fup

 

Many thanks! 

 

 

 

 

 

 

 

7 REPLIES 7
SASKiwi
PROC Star

Is 6/1/2010  - 6 Jan or 1 Jun? 

Is 2013/02/01 - 1 Feb or 2 Jan?

 

Let me guess this has come from a spreadsheet which will accept any old rubbish...

di_niu0
Obsidian | Level 7
I looked at other values. It should be June 1, 2010.
Patrick
Opal | Level 21

I guess you will have to go through some "trial and error" process to read the data into SAS as desired. 

Below some approach you could take. Any string that you can't convert into a SAS Date value will end up in table investigate. You then need to further add to your code or fix the data directly in your source.

data have;
  infile datalines dsd dlm='|' truncover;
  input id string $30.;
  datalines;
1|SEPT01/2001
2|9-Apr-10
3|20-JAN-18
4|Jan 7-2010
5|oct1/2018
6|NOV18 2010
7|FEB. 2, 2016
8|6/1/2010
9|2013/02/01 lost to fup
;

%let sv_datestye=%sysfunc(getoption(datestyle,,keyexpand));
options datestyle=mdy;

data want investigate;
  set have;
  length sas_date 8;
  format sas_date date9.;

  if _n_=0 then _string2=string;

  if missing(string) then 
    do;
      output want;
      return;
    end;

  sas_date=input(string, ?? anydtdte.);
  if not missing(sas_date) then
    do;
      output want;
      return;
    end;

  _string2=prxchange('s/^([a-z]+)(\d+)[^\d](\d+)/\2\1\3/oi',-1,strip(string));
  sas_date=input(compress(_string2), ?? anydtdte.);
  if not missing(sas_date) then
    do;
      output want;
      return;
    end;

  /*** and here more string manipulations until no rows remain in Investigate ***/

  else output investigate;

run;

options &sv_datestye;
Kurt_Bremser
Super User

Your task is logically impossible.

At least a date like this

20-JAN-18

is ambiguous (could be 2018-01-20 or 2020-01-18)

 

<RANT>Anybody who still uses 2-digit years is an idiot with terminal brain damage. Such poor creatures have no place near a computer except under adult supervision.</RANT>

di_niu0
Obsidian | Level 7
That's what I get. It's quit difficult to interpret the exact time. However, when I look at other values in the dataset, it should be interpreted as Jan 20, 2018.
andreas_lds
Jade | Level 19

Summing up what others already said: 4 of 9 strings can't translated into dates, because the available information is not sufficient to decide which number is day and which is year (IDs 2, 3) or day and month (IDs 8, 9). So either you have the knowledge, so that it can be coded, or the mission will fail.

di_niu0
Obsidian | Level 7

Hi guys. I really appreciated that you guys are trying to help me with this problem. It's quite messy but it's what I have to deal with. When I posted this question, I didn't realize I provided less information than my dataset had given to me. Anyways, I wrote some code that worked fine for me. For anyone who's interested, that's what I used:

 

reformat_date = input(compress(date, '.'), anydtdte13.);

reformat_date_2 = input(date, yymmdd10,);

format reformat_date reformat_date_2 date9.;

 

It didn't convert all dates but it did most of the work. I did other modifications too. 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1036 views
  • 5 likes
  • 5 in conversation