DATA Step, Macro, Functions and more

Programatically Identifying Numeric Variables that represent Date/Time/Datetimes

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 82
Accepted Solution

Programatically Identifying Numeric Variables that represent Date/Time/Datetimes

I'm trying to find a way to identify the subset of numeric variables that represent dates, times or datetimes from an arbitrary dataset that contains many variables of both numeric or character type.  The goal is to have a macro or template that I could run against any given dataset.  I believe that the formats for these variables should correctly represent what the variables are, but I'm trying to see if I can avoid having to create some code that parses the text of the SAS format for any of the applicable format names.

 

I was originally thinking it should be possible to do something like the following to get the SAS formats for a dataset and then do various things based on the columns that come back as date/time/datetime.  

 

PROC SQL;
    CREATE TABLE lib_X.Dataset_A AS
        SELECT tbl_CL.LIBNAME
              ,tbl_CL.MEMNAME   AS TABLE_NAME
              ,tbl_CL.NAME      AS COLUMN_NAME
              ,tbl_CL.FORMAT    AS SAS_Format
              ,tbl_CL.TYPE      AS SAS_Type
              ,tbl_CL.varnum
              ,tbl_CL.LENGTH
        FROM DICTIONARY.COLUMNS AS tbl_CL
        WHERE tbl_CL.MEMNAME = %UPCASE("&SAS_DatasetNm")
          AND tbl_CL.LIBNAME = %UPCASE("&SAS_LibNm");
QUIT;
DATA lib_X.Dataset_A; SET lib_X.Dataset_A; IF FCN_FORMAT_TYPE(SAS_Format) = "DATETIME" THEN ...; RUN;

Where the "FCN_FORMAT_TYPE" function is some function that examines the SAS format and returns which of the 4 categories of SAS Format (character, numeric, date/time, or ISO 8601) the variable is, but there doesn't seem to be an existing function that does this that I can find.   Perhaps there's another approach that would work?

 

Currently running on SAS 9.4 M2 on a Linux server, the code will primarily be used via batch file, but possibly via Enterprise Guide as well (EG 7.1 or 6.1)

 

 


Accepted Solutions
Solution
‎07-25-2016 12:30 PM
Frequent Contributor
Posts: 82

Re: Programatically Identifying Numeric Variables that represent Date/Time/Datetimes

[ Edited ]

So I went ahead and created a function using PROC FCMP that does what I wanted, well 2 functions actually, one that returns the four categories (plus missing or unknown), and another that actually breaks up the date, time and datetime formats if that level of specificity is desired (which I thought may come in handy at some point).  They're probably not bullet-proof, but they worked for my purposes and will hopefully be useful to others.  The biggest caveat I'd add is that handling user-defined formats will be touch and go since it's simply parsing the SAS format which is passed as a character variable, so if the name overlaps with the regular expressions I'm using, all bets are off.

 

PROC FCMP OUTLIB=LIB_X.FCMP_FUNCS.Format_Category_ID;
   FUNCTION Format_Category(VAR_Format $, VAR_Type $) $ 16
            GROUP = 'Format Category ID'
            LABEL = 'Pass SAS variable type and format as CHAR and it will return what sort of variable it is. 
                     Breaks down date/time/dt formats into independent categories.';
            LENGTH FORMAT_CATEGORY $16;
            SELECT;
                WHEN        (MISSING(VAR_Format))                                                   FORMAT_CATEGORY = 'MISSING_FORMAT';
                WHEN        (VAR_Type = 'CHAR')                                                     FORMAT_CATEGORY = 'CHARACTER_FORMAT';
                WHEN        (PRXMATCH("/S370FZDT/", VAR_Format))                                    FORMAT_CATEGORY = 'NUMERIC_FORMAT';
                WHEN        (PRXMATCH("/8601/", VAR_Format))                                        FORMAT_CATEGORY = 'ISO_8601_FORMAT';
                WHEN        (PRXMATCH("/(DT|DATEAMPM|DATETIME)/", VAR_Format))                      FORMAT_CATEGORY = 'DATETIME_FORMAT';
                WHEN        (PRXMATCH("/(HHMM|HOUR|MMSS|TIME|TOD)/", VAR_Format))                   FORMAT_CATEGORY = 'TIME_FORMAT';
                WHEN        (PRXMATCH("/(DAT|DAY|YY|DOW|JUL|MON|QTR|WEEK|YEAR)/", VAR_Format))      FORMAT_CATEGORY = 'DATE_FORMAT';
                WHEN        (VAR_Type = 'NUM')                                                      FORMAT_CATEGORY = 'NUMERIC_FORMAT';
                OTHERWISE                                                                           FORMAT_CATEGORY = 'UNKNOWN';
            END;
            RETURN(FORMAT_CATEGORY);
   ENDSUB;
   FUNCTION Format_Category2(VAR_Format $, VAR_Type $) $ 25
            GROUP = 'Format Category ID'
            LABEL = 'Pass SAS variable type and format as CHAR and it will return what sort of variable it is. 
                     Groups all date/time/dt formats into one category.';
            LENGTH FORMAT_CATEGORY $25;
            SELECT;
                WHEN        (MISSING(VAR_Format))                                                                                                   FORMAT_CATEGORY = 'MISSING_FORMAT';
                WHEN        (VAR_Type = 'CHAR')                                                                                                     FORMAT_CATEGORY = 'CHARACTER_FORMAT';
                WHEN        (PRXMATCH("/S370FZDT/", VAR_Format))                                                                                    FORMAT_CATEGORY = 'NUMERIC_FORMAT';
                WHEN        (PRXMATCH("/(DT|DATEAMPM|DATETIME|8601|HHMM|HOUR|MMSS|TIME|TOD|DAT|DAY|YY|DOW|JUL|MON|QTR|WEEK|YEAR)/", VAR_Format))    FORMAT_CATEGORY = 'DATE_TIME_DATETIME_FORMAT';
                WHEN        (VAR_Type = 'NUM')                                                                                                      FORMAT_CATEGORY = 'NUMERIC_FORMAT';
                OTHERWISE                                                                                                                           FORMAT_CATEGORY = 'UNKNOWN';
            END;
            RETURN(FORMAT_CATEGORY);
   ENDSUB;

 

View solution in original post


All Replies
Super User
Posts: 17,784

Re: Programatically Identifying Numeric Variables that represent Date/Time/Datetimes

SAS only has two variable types, numeric and character so this isn't a straightforward process.

 

Here's an attempt from about a year and a half ago regarding this problem:

 

https://communities.sas.com/t5/Base-SAS-Programming/Determine-if-a-variable-contains-a-SAS-date-valu...

 

Rather than starting from scratch maybe the solution above can help you get started. 

Super User
Posts: 10,483

Re: Programatically Identifying Numeric Variables that represent Date/Time/Datetimes

In addition to looking up format and informat information from the dictionary tables there are the functions VINFORMAT/VFormat or VINFORMATX/ VFORMATX (depending on how you provide the variable) that will return the informat and format directly for a variable.

 

 

Frequent Contributor
Posts: 82

Re: Programatically Identifying Numeric Variables that represent Date/Time/Datetimes

Good point, that'll definitely be worth looking into. There are actually two versions of this code, one that is complete and does basically the same thing but for Oracle tables (using SAS with explicit SQL passthrough) and I'm trying to translate it to look at SAS datasets rather than tables, but since I don't need to interface with Oracle for this, using the dictionary tables may not be best approach for this version.

PROC Star
Posts: 1,230

Re: Programatically Identifying Numeric Variables that represent Date/Time/Datetimes

Yes, particularly in server enviroments with lots and lots of preassigned libraries, dictionary.columns can get huge and slow.  Often I end up using PROC CONTENTS out= to get variable attributes instead.

 

I too have sometimes wanted the ability to guess whether a numeric variable represents a date or a date-time, based on the format attached, but have never put in the effort.  As per the other thread, seems like it shouldn't be that hard to pull the list of SAS-provided formats from dictionary.formats, and build a lookup table or permanent format that maps them to categories. And then would just need to update that every time you move to a new version.

 

And once you've done it, please share. : )

Solution
‎07-25-2016 12:30 PM
Frequent Contributor
Posts: 82

Re: Programatically Identifying Numeric Variables that represent Date/Time/Datetimes

[ Edited ]

So I went ahead and created a function using PROC FCMP that does what I wanted, well 2 functions actually, one that returns the four categories (plus missing or unknown), and another that actually breaks up the date, time and datetime formats if that level of specificity is desired (which I thought may come in handy at some point).  They're probably not bullet-proof, but they worked for my purposes and will hopefully be useful to others.  The biggest caveat I'd add is that handling user-defined formats will be touch and go since it's simply parsing the SAS format which is passed as a character variable, so if the name overlaps with the regular expressions I'm using, all bets are off.

 

PROC FCMP OUTLIB=LIB_X.FCMP_FUNCS.Format_Category_ID;
   FUNCTION Format_Category(VAR_Format $, VAR_Type $) $ 16
            GROUP = 'Format Category ID'
            LABEL = 'Pass SAS variable type and format as CHAR and it will return what sort of variable it is. 
                     Breaks down date/time/dt formats into independent categories.';
            LENGTH FORMAT_CATEGORY $16;
            SELECT;
                WHEN        (MISSING(VAR_Format))                                                   FORMAT_CATEGORY = 'MISSING_FORMAT';
                WHEN        (VAR_Type = 'CHAR')                                                     FORMAT_CATEGORY = 'CHARACTER_FORMAT';
                WHEN        (PRXMATCH("/S370FZDT/", VAR_Format))                                    FORMAT_CATEGORY = 'NUMERIC_FORMAT';
                WHEN        (PRXMATCH("/8601/", VAR_Format))                                        FORMAT_CATEGORY = 'ISO_8601_FORMAT';
                WHEN        (PRXMATCH("/(DT|DATEAMPM|DATETIME)/", VAR_Format))                      FORMAT_CATEGORY = 'DATETIME_FORMAT';
                WHEN        (PRXMATCH("/(HHMM|HOUR|MMSS|TIME|TOD)/", VAR_Format))                   FORMAT_CATEGORY = 'TIME_FORMAT';
                WHEN        (PRXMATCH("/(DAT|DAY|YY|DOW|JUL|MON|QTR|WEEK|YEAR)/", VAR_Format))      FORMAT_CATEGORY = 'DATE_FORMAT';
                WHEN        (VAR_Type = 'NUM')                                                      FORMAT_CATEGORY = 'NUMERIC_FORMAT';
                OTHERWISE                                                                           FORMAT_CATEGORY = 'UNKNOWN';
            END;
            RETURN(FORMAT_CATEGORY);
   ENDSUB;
   FUNCTION Format_Category2(VAR_Format $, VAR_Type $) $ 25
            GROUP = 'Format Category ID'
            LABEL = 'Pass SAS variable type and format as CHAR and it will return what sort of variable it is. 
                     Groups all date/time/dt formats into one category.';
            LENGTH FORMAT_CATEGORY $25;
            SELECT;
                WHEN        (MISSING(VAR_Format))                                                                                                   FORMAT_CATEGORY = 'MISSING_FORMAT';
                WHEN        (VAR_Type = 'CHAR')                                                                                                     FORMAT_CATEGORY = 'CHARACTER_FORMAT';
                WHEN        (PRXMATCH("/S370FZDT/", VAR_Format))                                                                                    FORMAT_CATEGORY = 'NUMERIC_FORMAT';
                WHEN        (PRXMATCH("/(DT|DATEAMPM|DATETIME|8601|HHMM|HOUR|MMSS|TIME|TOD|DAT|DAY|YY|DOW|JUL|MON|QTR|WEEK|YEAR)/", VAR_Format))    FORMAT_CATEGORY = 'DATE_TIME_DATETIME_FORMAT';
                WHEN        (VAR_Type = 'NUM')                                                                                                      FORMAT_CATEGORY = 'NUMERIC_FORMAT';
                OTHERWISE                                                                                                                           FORMAT_CATEGORY = 'UNKNOWN';
            END;
            RETURN(FORMAT_CATEGORY);
   ENDSUB;

 

Super User
Posts: 10,483

Re: Programatically Identifying Numeric Variables that represent Date/Time/Datetimes

If you use the variable information functions remember that you only need to look at one record to have the characteristics of all the variables so use dataset option obs=1.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 319 views
  • 8 likes
  • 4 in conversation