BookmarkSubscribeRSS Feed
n6
Quartz | Level 8 n6
Quartz | Level 8

 

Data has been given to me and there are dates we want to use but they are character variables right now.  In an ideal world they would all be like '03/27/2014' but we're not in an ideal world and some are like '3/27/2014' or /03/27/14' or '%/%/2014' (% is apparently a placeholder for missing) and on and on.  My logical, uncreative first impulse to just plow through and correct all bad ones via hardcoding but part of me thinks that there has to be an easier way.  The ultimate goal is to have them as SAS dates and thus be able to use them in subtraction but it seems the first step it to make them be sensible character variables.  Any thoughts on how to systematically address stuff like this?

9 REPLIES 9
RW9
Diamond | Level 26 RW9
Diamond | Level 26

"Data has been given to me" - how has it been given to you?  Is it a text file you need to read in?  What does the data definition document state, if it doesn't have one how are you supposed to interpret the data?  This: %/% is really terrible, not only does it make the actual data harder to use, it could be mistaken for a mask or something else.  With that setup I don't think there is a simple way other than reading it in as text, and manually processing - yes the data model is that bad.  As for how to do that, again depends on the data definition document without which is this:

03/04/2019

03Apr or 04Mar?  You can get each part simply with scan(), then input into a date.

n6
Quartz | Level 8 n6
Quartz | Level 8

Saying it was given to me was poor wording on my part.  I've been given a project and unfortunately whatever planning the previous person did isn't available to me so I have to start from scratch.  I import the data in a big text file.  I do know that the correct formatting is MM/DD/YYYY so at least I don't have to worry about whether 03/04 means Mar 4 or Apr 3.

 

I don't know what a "mask" is but I think a % was the previous persons way of using a placeholder to say we don't know what the data is.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Format masks, they are used to represent what the data is formatted like, e.g:

http://www.ndsapps.com/webhelp/xephr/examples/format_mask_examples.htm

 

So MM/DD/YYYY is itself a mask.

 

In which case, just read the data in as text and compress(date,'%').  I depends on what you want to do with missings, 2019 is not a valid date, nor is 01-2019, it needs all component parts.  So a simple length check:

data want;
  set have;
  if lengthn(compress(text_date,'%'))=10 then do;
    want_date=input(text_date,mmddyy10.);
  end;
  format want_date date9.;
run;
PeterClemmensen
Tourmaline | Level 20

Questions like this are always hard to answer because we can not see your data. Therefore, we can not see all the different ways that your character dates can differ from forms that can be read by some SAS date informat. Also, what do you want to do with a value of '%/%/2014'?

 

One suggestion could be to use the ANYDTDTE Informat and do something like this

 

data want;
   set have;
   date=input(CharDate, anydtdte12.);
   format date date9.;
run;
n6
Quartz | Level 8 n6
Quartz | Level 8

 

Here are a handful that I started hard coding before I realized this was quixotic.    I know that on some of them I don't even change anything.  Also, I note that some have - instead of /.  Maybe that's okay in SAS.

 

VisitDate = '%' then VisitDate = ' '
VisitDate = '%/%/%' then VisitDate = ' '
VisitDate = '%/%/2008' then VisitDate = '%/%/2008'
VisitDate = '%/%/2009' then VisitDate = '%/%/2009'

VisitDate = '01/%/2010' then VisitDate = '01/%/2010'
VisitDate = '01//2013' then VisitDate = '01//2013'
VisitDate = '01/1/2010' then VisitDate = '01/01/2010'
VisitDate = '02-03-2011' then VisitDate = '02/03/2011'
VisitDate = '02-11-2011' then VisitDate = '02/11/2011'

if VisitDate = '03/%/2008' then VisitDate = '03/%/2008'
if VisitDate = '03/%/2010' then VisitDate = '03/%/2010'
if VisitDate = '03/13/12' then VisitDate = '03/13/2012'
if VisitDate = '03/5/2013' then VisitDate = '03/05/2013'

Tom
Super User Tom
Super User

If the goal is just to clean it up and keep the same style then this might help.

data have;
  input VisitDate $10. ;
cards;
% 
%/%/%
%/%/2008
%/%/2009
01/%/2010 
01//2013  
01/1/2010 
02-03-2011
02-11-2011
03/%/2008
03/%/2010
03/13/12 
03/5/2013
;

%let missval=%qsysfunc(getoption(missing));
options missing='%';
data want ;
  set have ;
  oldval=visitdate;
  if not missing(input(visitdate,??mmddyy10.)) 
    then visitdate=put(input(visitdate,mmddyy10.),mmddyys10.)
  ;
  else do;
    visitdate=translate(visitdate,'/','-');
    m=scan(visitdate,1,'/','m');
    d=scan(visitdate,2,'/','m');
    y=scan(visitdate,3,'/','m');
    if length(Y)=2 then y='20'||y;
    visitdate=catx('/',put(input(m,??2.),z2.),put(input(d,??2.),z2.),put(input(y,??4.),4.));
  end;
  drop m d y;
run;
options missing="&missval";
proc print;
run;

Result:

Obs    VisitDate     oldval

  1    %/%/%         %
  2    %/%/%         %/%/%
  3    %/%/2008      %/%/2008
  4    %/%/2009      %/%/2009
  5    01/%/2010     01/%/2010
  6    01/%/2013     01//2013
  7    01/01/2010    01/1/2010
  8    02/03/2011    02-03-2011
  9    02/11/2011    02-11-2011
 10    03/%/2008     03/%/2008
 11    03/%/2010     03/%/2010
 12    03/13/2012    03/13/12
 13    03/05/2013    03/5/2013
Kurt_Bremser
Super User
data have;
input chardate $10.;
cards;
03/27/2014
3/27/2014
/03/27/14
%/%/2014
;
run;

data
  want
  to_return
;
set have;
format sas_date mmddyy10.;
if chardate ne ' '
then do;
  sas_date = input(chardate,mmddyy10.);
  if sas_date = .
  then output to_return;
  else output want;
end;
else output want; /* "real" missing value */
run;

Dataset to_return is created with the intention to return it to sender with a friendly suggestion to supply data, not garbage.

If certain formats can be handled with a consistent rule, you can add processing in the "then" branch of the first if, but with too much leniency you run the risk of creating false values without noticing it.

n6
Quartz | Level 8 n6
Quartz | Level 8

Sorry for posting and kinda fleeing but I got really busy and now I'm heading out of town for a few days.  Thanks for the replies and if I need more info I'll let you know when I get back.

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @n6 

 

In many cases "return to sender" is not an option, e.g. if the source is an application where data is generated by many users for a long period, so we have to deal with garbage and make the best of it. The definition of "best" depends on the further use of data, so it is not a technical matter to decide if a partial date is best represented as a missing value or as an interpretation where a missing day component is set to 1 or 15 or whatever.

 

If you are working on your own project you can make your own decisions like "I only want years in my final analysis". But if you are just preparing data, you shold always consult the requestor and make her define "best" instead of making your own interpretation and present the result as facts to the requestor.

 

It is tempting to read dirty dates with an any-informat as shown in @PeterClemmensen 's example. But be careful, because the result is an interpretation, where missing values are filled out and day/month may be shifted, as the following example will show:

 

data work.have;
  input VisitDate $20. ;
cards;
% 
%/%/%
%/%/2008
%/%/2009
01/%/2010 
01//2013  
01/1/2010 
02-03-2011
02-11-2011/
03/%/2008
03/%/2010
03/13/12 
13/03/12 
03/5/2013
/% 
/%/%/%
%/%/2008/
%/%/2009
/01/%/2010 
/01//2013  
;

data work.w ; set work.have;
	format date_any date_mdy mmddyy10.;
	date_any = input(VisitDate, ??anydtdte12.);
	date_mdy = input(VisitDate, ??mmddyy10.);
run;

 

informat.gif

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1624 views
  • 3 likes
  • 6 in conversation