BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
KAZ
Obsidian | Level 7 KAZ
Obsidian | Level 7

I have a data sources that contains a column (string) that is not consistent with respect to how it stores dates.  Some are in the form mm/dd/yyyy, some are mm/yyyy and some are yyyy.

 

The anydtdte21. informat performs admirably, with the excpetion of just a year given.  In that case, it returns a missing value.

 

How do I create an informat that tries anydtdte21. first, and if it fails assumes the input is in the format YYYY?

 

This code is used as an example

data _a;
    length dtTXT $24.;
    dtTxt = "03/01/2015";     output;
    dtTxt = "02/2015";         output;
    dtTXT = "2015";            output;
run;

data _b;
    set _a;
    format _dt _dt2 date.;

    *** I would like to replace this code ***;
    _dt = input(dtTxt, anydtdte21.);
    if _dt = . then _dt = input("01/01/"||dtTxt, anydtdte21.);

    **** REPLACE anydtdte21. here with a custome informat to achieve the same results as the above code ****;
    _dt2 = input(dtTxt, anydtdte21.);

run;

proc print data=_b; run;

I would like to create an informat that correctly translates all three obs.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The B8601DA4. format will read four digit years as the first day of the year.

 

proc format;   
  invalue messyDate
    "/^\d{1,2}\D\d{1,2}\D\d{4}/" (regexp) = [mmddyy10.]    
    "/^\d{4}/" (regexp)       = [B8601DA.]
    other           = [anydtdte32.]
  ;
run;

View solution in original post

9 REPLIES 9
ballardw
Super User

You aren't going to find an informat that will handle all three. I've been somewhat curious about how SAS goes about implementing the Anydtdte and related informats as the tools we have for building informats with Proc Format don't seem possible to build them.

 

If you insist on a single line of code you could create a function that would contain your existing code to return a date value but I doubt the added complexity helps much.

 

Reeza
Super User

If you only have 3 variations you may as well read it in as a character, check the length and use the appropriate format for the length.

 

if length(char_date) = 4 then date=mdy(1, 1,  input(char_date, 4.));
else date=input(char_date, anydtdte.);
art297
Opal | Level 21

I really wish that my/our colleagues would stop saying that something can't be done. That has taken me away from my real work for more hours than I'd care to count.

 

Regardless, here is your custom informat:

data _a;
    length dtTXT $24.;
    dtTxt = "03/01/2015";     output;
    dtTxt = "02/2015";        output;
    dtTXT = "2015";           output;
run;

data infmt;
  retain fmtname "yearonly" type "I" ;
  do start=1900 to 2100;
    label = mdy(1,1,start); 
    output ;
  end ;
run;
proc format cntlin = infmt ; run ;
proc format; invalue dte 1900-3000=yearonly. other=anydtdte21.; run; data _b; set _a; format _dt _dt2 date.; *** I would like to replace this code ***; _dt = input(dtTxt, anydtdte21.); if _dt = . then _dt = input("01/01/"||dtTxt, anydtdte21.); **** REPLACE anydtdte21. here with a custom informat to achieve the same results as the above code ****; _dt2 = input(dtTxt, dte.); run;

Art, CEO, AnalystFinder.com

 

 

Kurt_Bremser
Super User

Actually, the anydtdte format does not perform "admirably", it results in wrong values for your dates:

data test;
input datestring :$21.;
date = input(datestring,anydtdte21.);
format date date9.;
cards;
03/01/2015
12/12/2017
12/10/2017
10/12/2017
10/13/2017
12/2017
2017
;
run;

proc print data=test noobs;
run;

Result:

datestring         date

03/01/2015    03JAN2015
12/12/2017    12DEC2017
12/10/2017    12OCT2017
10/12/2017    10DEC2017
10/13/2017    13OCT2017
12/2017       01DEC2017
2017                  .

As you can see, it only uses MMDDYY when the second digits exceed 12, otherwise you get DDMMYY.

I would never use that format because of such shortcomings. Instead expand shorter strings to your likings with clearly defined default values, and then use one matching format that only accepts one defined sequence. If you allow the computer to do your thinking for you, you'll end up with grief. Computers are dumb.

andreas_lds
Jade | Level 19

idea: Using regex in proc format and a function created with proc fcmp. The function is necessary because in a format defnition only functions with zero or one argument can be called and we need mdy().

 

EDIT: Fixed the format definition, still uses anydtdte - could be replaced by another function.

EDIT2: Fixed bad format of code.

 

data work.have;
    length dtTXT $24.;
    dtTxt = "03/01/2015";     output;
    dtTxt = "02/2015";        output;
    dtTXT = "2015";           output;
run;

proc fcmp outlib=work.funcs.dates;
   function createDate(YearStr $);
      return (mdy(1, 1, left(YearStr)));
   endsub;
run;

options cmplib= work.funcs;


proc format;   
   invalue GetDate
      "/^\d\d\/\d\d\/\d{4}/" (regexp) = [mmddyy10.]
      "/^\d\d\/\d{4}/" (regexp) = [anydtdte.]
      "/^\d{4}/" (regexp) = [createDate()]
      other = 0
   ;
run;

data work.want;
   set have;

   format d Date9.;

   put dtTxt=;

   d = input(dtTxt, GetDate.);
run;

 

 

KAZ
Obsidian | Level 7 KAZ
Obsidian | Level 7

Thanks all for the feedback.  I am trying to avoid using a user defned function, and want to accomplish this with informats.

 

I've combined several of the suggestions and have come up with this:

(Thanks @Kurt_Bremser for pointing out anydtdte's, ummm, "feature").

 

data infmt;
  retain fmtname "yyyy" type "I" ;
  do start=0001 to 9999;
    label = mdy(1,1,start); 
    output ;
  end ;
run;
proc format cntlin = infmt ; run ;

proc format;   
	invalue messyDate
		"/^\d{1,2}\D\d{1,2}\D\d{4}/" (regexp)	= [mmddyy10.]    
		"/^\d{4}/" (regexp) 			= [yyyy.]
		other 					= [anydtdte32.]
	;
run;

data test;
infile datalines truncover dlm="|";
input datestring :$64.; 			format datestring $64.;
formatedDate = input(datestring, messyDate.);	format formatedDate date11.; 
cards;
12/12/2017
12/10/2017
10/12/2017
10/13/2017
03/01/2015
03-01-2015
03.01.2015
03 01 2015 3/1/2015 03/2015 3/2015 2017 1March2015 01March2015 1MAR2015 Mar 01 2015 MAR 01, 2015 March 1, 2015 March 1 2015 Mar 2015 March 2015 March, 2015 ; run; proc print data=test; run;

As you can see, I was a little coy regarding how messy the data are.  This seems to work with the exception of the last two rows, i.e. when the full month is spelled out but no day is given.

Tom
Super User Tom
Super User

The B8601DA4. format will read four digit years as the first day of the year.

 

proc format;   
  invalue messyDate
    "/^\d{1,2}\D\d{1,2}\D\d{4}/" (regexp) = [mmddyy10.]    
    "/^\d{4}/" (regexp)       = [B8601DA.]
    other           = [anydtdte32.]
  ;
run;
mftuchman
Quartz | Level 8

I Did not know you could regex your way through an informat.  I was looking to solve a similar problem, but this is just what I need.   Some of my SAS knowledge needs updating.  I appreciated this example. I would have eventually found it in the docs, but I found it here!

 

KAZ
Obsidian | Level 7 KAZ
Obsidian | Level 7

Updating the informat slightly to include mmddyy regex without the slashes

 

proc format;   
	invalue messyDate
		"Unknown"								= .
		"/^\d{1,2}\D\d{1,2}\D\d{4}/" (regexp)	= [mmddyy10.]  
		"/^\d{1,2}\d{1,2}\d{4}/" (regexp)	= [mmddyy10.]    
		"/^\d{4}/" (regexp) 				= [B8601DA4.]
		other 						= [anydtdte32.]
	;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2157 views
  • 9 likes
  • 8 in conversation