BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Rick_SAS
SAS Super FREQ

What is the earliest date you expect to encounter?  If this is financial data, I doubt you need to worry about the 17th century

 

mahler_ji
Obsidian | Level 7

That is correct.  1975 is about the earliest that we will ever encounter, but because of the logic of the problem, essentially everything after 1913 is what you are going to end up getting, I think. 

 

Thanks for your help!

 

Solving problems like this is fun!

FreelanceReinh
Jade | Level 19

As @Rick_SAS and others have pointed out very clearly, any kind of "logic" based on the date string in the filename alone will fail as soon as you have two identical date strings with different meanings. However, if you restrict the set of potential dates as far as possible (exclude Sundays, holidays, ... whatever) you may be able to exclude at least some ambiguous cases. The following code could be helpful to get a complete overview of the ambiguous cases. It starts with all possible date strings in the range 01JAN1975 - 31DEC2074. Feel free to adapt this date range as you like and insert selection or exclusion criteria as appropriate.

/* Create all possible date strings in a range */

data alldates;
do d='01JAN1975'd to '31DEC2074'd; /* adapt the date range as you like */
  dc=put(d, yymmddn8.); f='1'; output; /* YYYYMMDD */
  if day(d)=1 then do;
    dc=put(d, yymmn6.); f='2'; output; /* YYYYMM   */
  end;
  dc=put(d, yymmddn6.); f='3'; output; /* YYMMDD   */
end;
format d date9.;
run;

/* Select the ambiguous cases, determine max. number of interpretations */

proc sql;
create table ambig0 as
select *, count(*) as c
from alldates
group by dc
having c>1
order by dc, d;

select max(c) into :mc
from ambig0;
quit;

/* Enhance and restructure dataset of ambiguous cases */

data ambig;
do until(last.dc);
  set ambig0;
  by dc;
  array date[&mc];
  length fcomb $%eval(2*&mc-1);
  fcomb=catx(',',fcomb,f);
  i=sum(i,1);
  date[i]=d;
end;
drop d f i;
format date: date9.;
run;

As you can see, with the above date range there are 144 ambiguous cases, all involving two possible interpretations, the second and third format and the year 2020. Now you can concentrate on these cases:

  • How likely is it to receive financial data from the years 2001 - 2012 only in 2020?
  • Wouldn't a dataset labeled, say, "June 2012" contain differently structured/aggregated data than one labeled "06 Dec 2020" anyway (both "labels" showing as '201206', of course)?
  • If the file contains date values, does the filename matter at all?
  • ...
mahler_ji
Obsidian | Level 7

@FreelanceReinh, thank you veyr much.

 

I was just finishing a solution very similar to this.  I think that this is one of the best ways to go about it!

 

Thank you,

John

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 3173 views
  • 9 likes
  • 7 in conversation