As @Rick_SAS and others have pointed out very clearly, any kind of "logic" based on the date string in the filename alone will fail as soon as you have two identical date strings with different meanings. However, if you restrict the set of potential dates as far as possible (exclude Sundays, holidays, ... whatever) you may be able to exclude at least some ambiguous cases. The following code could be helpful to get a complete overview of the ambiguous cases. It starts with all possible date strings in the range 01JAN1975 - 31DEC2074. Feel free to adapt this date range as you like and insert selection or exclusion criteria as appropriate.
/* Create all possible date strings in a range */
data alldates;
do d='01JAN1975'd to '31DEC2074'd; /* adapt the date range as you like */
dc=put(d, yymmddn8.); f='1'; output; /* YYYYMMDD */
if day(d)=1 then do;
dc=put(d, yymmn6.); f='2'; output; /* YYYYMM */
end;
dc=put(d, yymmddn6.); f='3'; output; /* YYMMDD */
end;
format d date9.;
run;
/* Select the ambiguous cases, determine max. number of interpretations */
proc sql;
create table ambig0 as
select *, count(*) as c
from alldates
group by dc
having c>1
order by dc, d;
select max(c) into :mc
from ambig0;
quit;
/* Enhance and restructure dataset of ambiguous cases */
data ambig;
do until(last.dc);
set ambig0;
by dc;
array date[&mc];
length fcomb $%eval(2*&mc-1);
fcomb=catx(',',fcomb,f);
i=sum(i,1);
date[i]=d;
end;
drop d f i;
format date: date9.;
run;
As you can see, with the above date range there are 144 ambiguous cases, all involving two possible interpretations, the second and third format and the year 2020. Now you can concentrate on these cases:
How likely is it to receive financial data from the years 2001 - 2012 only in 2020?
Wouldn't a dataset labeled, say, "June 2012" contain differently structured/aggregated data than one labeled "06 Dec 2020" anyway (both "labels" showing as '201206', of course)?
If the file contains date values, does the filename matter at all?
...
... View more