DATA Step, Macro, Functions and more

How to convert date to character dates when source date contain unknown characters

Reply
Super Contributor
Posts: 272

How to convert date to character dates when source date contain unknown characters

Dear,

 

My data contains date variable with the following values. I am using the following code to get the output. I got the output except for a few OBS containing 'jun' as the month because I am trying to remove 'un' charcters using tranwrd function which removes 'un' in 'jun' month. Please help me simply the code. Thanks

 

 

date

 

//2011
00/UNK/UNK
01/Dec/2014
02/UNK/2014
18/jun/2008
UN/mar/2023
Un/apr/2013
uN/sep/2011

un/oct/2009
Un/UNK/2003
UN/UNK/2002
uN/UNK/2015
Un/UNK/2006

 

 

code;

date1=tranwrd(date_raw,'/','');
date2=tranwrd(date1,'UNK','');
date3=tranwrd(date2,'un','');
date4=tranwrd(date3,'UN','');
date5=tranwrd(date4,'Un','');
date6=compress(tranwrd(date5,'uN',''));

 

if length(date6)=9 then STDTC2= put(input(date6,date9.),is8601da.);

if length(date6)=7 then STDTC3= put(input(date6,monyy7.),yymmd7.);
if length(date6)=4 then STDTC4= put(input(date6,best.),10.);

 

if stdtc2 ^='' then isdate=stdtc2;

else if stdtc3 ^='' then isdate=stdtc3;

else if stdtc4 ^='' then isdate=stdtc4;

 

output needed;

2011

blank

2014-12-01
2014--02
2008-06-18
2023-03
2013-04
2011-09

2009-10
2003
2002
2015
2006

Trusted Advisor
Posts: 1,554

Re: How to convert date to character dates when source date contain unknown characters

Posted in reply to knveraraju91

Add to your code at the beginning:

 

 if input(datex, ??  ddmmyy10.)  ne . 

   then date_out = put(input(datex, ddmmyy10.),  yymmdd10.);

else do;

   ... your code ...

end;

 

The only "problem" that may arouse is that date_out will be in dormat of: yyyy/mm/dd instead yyyy-mm-dd

and that can be corrected by date_out = translate(date_out, '-' , '/' );

Super Contributor
Posts: 272

Re: How to convert date to character dates when source date contain unknown characters

Thank you for help. If I use my code where you mentioned to use. Does it work for values 'Un/jun/2012' . My code removes 'un' in jun. Please help.  

Trusted Advisor
Posts: 1,554

Re: How to convert date to character dates when source date contain unknown characters

[ Edited ]
Posted in reply to knveraraju91

The expression  input(datex, ??  ddmmyy10.) checks for valid date and

assigns a number that is the days passed since 01-01-1960 otherwise it

assingns a missing value.

 

As  'Un/jun/2012' is not a valid date, it will execute the ELSE part of the IF statement.

In oredr to overcome JUN month, I would prefer a differnt code than yours assuming that

date_raw is always in a format of day/month/year .

I havn't run this code. You may try it:

 

%LET MM =  'JAN' ' FEB' 'MAR' 'APR' 'MAY' 'JUN' 'JUL' 'AUG' 'SEP' 'OCT' 'NOV' 'DEC' ;

 

DATA WANT;

  SET HAVE;

 

 array mx $ &mm;

 

 if input(datex, ??  ddmmyy10.)  ne .                         /* valid date given */

   then date_out = put(input(datex, ddmmyy10.),  yymmdd10.);

else do;

    month = scan(date_raw,2);

    year = scan(date_raw,3);

    if   length(trim(month)) = 2 and                    /* 2 digits month */

         1 le input(month, best2.) le 12 and

         input(year, best4.) ge 1960                    /* check for reasonable year */

    then date_out = catx( '-', year,month);   

    else if  length(trim(month)) = 3 and           /* 3 characters month */

           upcase(month) in (&mm) and 

           input(year, best4.) ge 1960  

           then do;

                 do i=1 to 12;

                     if mx(i) = upcase(trim(month)) then do;

                       month = put(i , z2.);     /* convert month mmm to its number, 2 digits */

                       date_out = catx( '-', year, trim(month)); 

                        leave;

                    end; 

                end; end;

   else if  input(year, best4.) ge 1960  

   then date_out = trim(year);

end;  

run;

 

 

 

 

Respected Advisor
Posts: 4,173

Re: How to convert date to character dates when source date contain unknown characters

[ Edited ]
Posted in reply to knveraraju91

The mdy() function is quite helpful here. See code below.

data have;
infile datalines truncover;
input have_str $11.;
datalines;
//2011
00/UNK/UNK
01/Dec/2014
02/UNK/2014
18/jun/2008
UN/mar/2023
Un/apr/2013
uN/sep/2011
un/oct/2009
Un/UNK/2003
UN/UNK/2002
uN/UNK/2015
Un/UNK/2006
  
Un/##/2006
;
run;

options missing=' ';
data want;
  set have;
  length want_str $11.;

  /* valid day, month and year? */
  want_str=put(input(have_str, ?? date11.),yymmdd10.);

  if missing(want_str) and not missing(have_str) then
    do;
      length _day $2 _month $3 _year $4;
      _day  =scan(have_str,1,'/','m');
      _month=put(month(input(scan(have_str,2,'/','m')||'2000', ?? monyy.)),z2.);
      _year =scan(have_str,3,'/','m');

      /* valid day and year? */
      if mdy('01',_day,_year) then want_str=_year||'--'||_day;

      /* valid month and year? */
      else if mdy(_month,'01',_year) then want_str=_year||'-'||_month;

      /* valid year? */
      else if mdy('01','01',_year) then want_str=_year;

    end;
run;

 

Super Contributor
Posts: 272

Re: How to convert date to character dates when source date contain unknown characters

Thank you for the help. I got the output I need. But I am getting warning message as "invalid argument  for OBS where _day='un' 'UN' and _year='UNK'. I added a few additional lines to the code. Is there any other option to simplyfy it. Thanks.

 

code;

options missing=' ';
data want;
  set have;
  length want_str $11.;

  /* valid day, month and year? */
  want_str1=put(input(have_str, ?? date11.),yymmdd10.);

  if missing(want_str) and not missing(have_str) then
    do;
      length _day $2 _month $3 _year $4;
      _day  =scan(have_str,1,'/','m');
      _month=put(month(input(scan(have_str,2,'/','m')||'2000', ?? monyy.)),z2.);
      _year =scan(have_str,3,'/','m');

/* valid day and year? */
if _day not in ('un' 'UN' 'uN' 'Un') and _year ^='UNK' then do;
if mdy('01',_day,_year) then want_str2=_year||'--'||_day;
end;

/* valid month and year? */
if _year ^='UNK' then do;

if mdy(_month,'01',_year) then want_str3=_year||'-'||_month;
end;
/* valid year? */
if _year^='UNK' then do;
if mdy('01','01',_year) then want_str4=_year;
end;

end;

if want_str1 ^='' then wantstr=want_str1;

else if want_str2 ^='' then wantstr=want_str2;

else if want_str3 ^='' then wantstr=want_str3;

else if want_str4 ^='' then wantstr=want_str4;

 


run;

Respected Advisor
Posts: 4,173

Re: How to convert date to character dates when source date contain unknown characters

Posted in reply to knveraraju91

I should have checked the log.... mdy() requires numerical variables. Below a code version which creates a bit less Notes in the log about conversion from character to numerical.

data have;
infile datalines truncover;
input have_str $11.;
datalines;
//2011
00/UNK/UNK
01/Dec/2014
02/UNK/2014
18/jun/2008
UN/mar/2023
Un/apr/2013
uN/sep/2011
un/oct/2009
Un/uNk/2003
 
Un/##/2006
99/999/2006
;
run;

options missing=' ';
data want;
  set have;
  length want_str $11.;

  /* valid day, month and year? */
  want_str=put(input(have_str, ?? date11.),yymmdd10.);

  if missing(want_str) and not missing(have_str) then
    do;
      length _day _month _year 8;
      _day  =input(scan(have_str,1,'/','m'), ?? best32.);
      _month=month(input(scan(have_str,2,'/','m')||'2000', ?? monyy.));
      _year =input(scan(have_str,3,'/','m'),?? best32.);

      if _day<1 or _day>31 then call missing(_day);
      if _month<1 or _month>12 then call missing(_month);

      /* valid day and year? */
      if mdy(01,_day,_year) then want_str=put(_year,z4.)||'--'||put(_day,z2.);

      /* valid month and year? */
      else if mdy(_month,01,_year) then want_str=put(_year,z4.)||'-'||put(_month,z2.);

      /* valid year? */
      else if mdy(01,01,_year) then want_str=put(_year,z4.);

    end;
run;

If you're using my code version then you don't need to add all these checks like if _day not in ('un' 'UN' 'uN' 'Un') and _year ^='UNK' then do; The code I've posted works fine without it and _day will either contain a number or a missing.

 

If the number in _day is a valid day then the mdy() function will return a value (if the 2 other parameters are valid as well), else it will return a missing. So that's the check.

 

As you can see this code version doesn't only deal with "UN/UNK" but with any string which doesn't represent a valid day, month or year.

Super Contributor
Posts: 272

How to read date containing unknown characters

Posted in reply to knveraraju91

Dear,

 

My data contains date variable with the following values. I am using the following the code. I am not able to run the code. Please help. It giving me error message.Thanks

 

 

datex

 

//2011
00/UNK/UNK
01/Dec/2014
02/UNK/2014
18/jun/2008
UN/mar/2023
Un/apr/2013
uN/sep/2011

un/oct/2009
Un/UNK/2003
UN/UNK/2002
uN/UNK/2015
Un/UNK/2006

un/jun/2002

 

output needed;

2011

blank

2014-12-01
2014--02
2008-06-18
2023-03
2013-04
2011-09

2009-10
2003
2002
2015
2006

 

code;(Some one posted this code);

 

%LET MM =  'JAN' ' FEB' 'MAR' 'APR' 'MAY' 'JUN' 'JUL' 'AUG' 'SEP' 'OCT' 'NOV' 'DEC' ;

 

DATA WANT;

  SET HAVE;

 

 array mx $ &mm;

 

 if input(datex, ??  ddmmyy10.)  ne .                         /* valid date given */

   then date_out = put(input(datex, ddmmyy10.),  yymmdd10.);

else do;

    month = scan(date_raw,2);

    year = scan(date_raw,3);

    if   length(trim(month)) = 2 and                    /* 2 digits month */

         1 le input(month, best2.) le 12 and

         input(year, best4.) ge 1960                    /* check for reasonable year */

    then date_out = catx( '-', year,month);   

    else if  length(trim(month)) = 3 and           /* 3 characters month */

           upcase(month) in (&mm) and 

           input(year, best4.) ge 1960  

           then do;

                 do i=1 to 12;

                     if mx(i) = upcase(trim(month)) then do;

                       month = put(i , z2.);     /* convert month mmm to its number, 2 digits */

                       date_out = catx( '-', year, trim(month)); 

                        leave;

                    end; 

                end; end;

   else if  input(year, best4.) ge 1960  

   then date_out = trim(year);

end;  

run;

 

 

Respected Advisor
Posts: 4,173

Re: How to read date containing unknown characters

Posted in reply to knveraraju91

Please don't post the same question twice.

 

Here the link to the thread with replies: 

https://communities.sas.com/t5/Base-SAS-Programming/How-to-convert-date-to-character-dates-when-sour...

 

 

Trusted Advisor
Posts: 1,554

Re: How to read date containing unknown characters

[ Edited ]

After hard work I simplified the code as:

 

data have;
infile datalines;
input datex $11. ;
datalines;
//2011
00/UNK/UNK
01/Dec/2014
02/UNK/2014
18/jun/2008
UN/mar/2023
Un/apr/2013
uN/sep/2011
un/oct/2009
Un/UNK/2003
UN/UNK/2002
uN/UNK/2015
Un/UNK/2006
; run;


%LET MM = 'JAN' ' FEB' 'MAR' 'APR' 'MAY' 'JUN' 'JUL' 'AUG' 'SEP' 'OCT' 'NOV' 'DEC' ;

 

DATA WANT(keep=datex date_out day month year );   /*drop unrelevant variables */
         length date_out $10 day $2 month $3 year $4;
  SET HAVE;
       array mx {12} $ ( &mm );

      day = scan(datex,1,'/','m');
      month = scan(datex,2,'/','m');
      year = scan(datex,3,'/','m');

if length(strip(month)) = 3 and                /* 3 characters month */
     upcase(month) in (&mm) and
     input(year,?? best4.) ge 1960
then do;
    do i=1 to 12;
        if mx(i) = upcase(trim(month)) then do;
           month = put(i , z2.);                  /* convert month mmm to its number, 2 digits */
          leave;
      end;
end; end;

lday     =  compress(day,'0123456789');
lmonth = compress(month,'0123456789');
lyear    = compress(year,'0123456789');

if lday=' ' and lmonth=' ' and lyear=' '
then date_out = catx('-',year,month); else
if lday=' ' and lyear=' ' and lmonth ne ' '
then date_out = cat(year,'--',day); else
if lday ne ' ' and lmonth=' ' and lyear=' '
then date_out = catx('-',year,month); else
if lday ne ' ' and lmonth ne ' ' and lyear=' '
then date_out = year; else date_out = ' ';
run;





Ask a Question
Discussion stats
  • 9 replies
  • 539 views
  • 5 likes
  • 3 in conversation