BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sven111
Pyrite | Level 9

I'm trying to find a way to identify the subset of numeric variables that represent dates, times or datetimes from an arbitrary dataset that contains many variables of both numeric or character type.  The goal is to have a macro or template that I could run against any given dataset.  I believe that the formats for these variables should correctly represent what the variables are, but I'm trying to see if I can avoid having to create some code that parses the text of the SAS format for any of the applicable format names.

 

I was originally thinking it should be possible to do something like the following to get the SAS formats for a dataset and then do various things based on the columns that come back as date/time/datetime.  

 

PROC SQL;
    CREATE TABLE lib_X.Dataset_A AS
        SELECT tbl_CL.LIBNAME
              ,tbl_CL.MEMNAME   AS TABLE_NAME
              ,tbl_CL.NAME      AS COLUMN_NAME
              ,tbl_CL.FORMAT    AS SAS_Format
              ,tbl_CL.TYPE      AS SAS_Type
              ,tbl_CL.varnum
              ,tbl_CL.LENGTH
        FROM DICTIONARY.COLUMNS AS tbl_CL
        WHERE tbl_CL.MEMNAME = %UPCASE("&SAS_DatasetNm")
          AND tbl_CL.LIBNAME = %UPCASE("&SAS_LibNm");
QUIT;
DATA lib_X.Dataset_A; SET lib_X.Dataset_A; IF FCN_FORMAT_TYPE(SAS_Format) = "DATETIME" THEN ...; RUN;

Where the "FCN_FORMAT_TYPE" function is some function that examines the SAS format and returns which of the 4 categories of SAS Format (character, numeric, date/time, or ISO 8601) the variable is, but there doesn't seem to be an existing function that does this that I can find.   Perhaps there's another approach that would work?

 

Currently running on SAS 9.4 M2 on a Linux server, the code will primarily be used via batch file, but possibly via Enterprise Guide as well (EG 7.1 or 6.1)

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Sven111
Pyrite | Level 9

So I went ahead and created a function using PROC FCMP that does what I wanted, well 2 functions actually, one that returns the four categories (plus missing or unknown), and another that actually breaks up the date, time and datetime formats if that level of specificity is desired (which I thought may come in handy at some point).  They're probably not bullet-proof, but they worked for my purposes and will hopefully be useful to others.  The biggest caveat I'd add is that handling user-defined formats will be touch and go since it's simply parsing the SAS format which is passed as a character variable, so if the name overlaps with the regular expressions I'm using, all bets are off.

 

PROC FCMP OUTLIB=LIB_X.FCMP_FUNCS.Format_Category_ID;
   FUNCTION Format_Category(VAR_Format $, VAR_Type $) $ 16
            GROUP = 'Format Category ID'
            LABEL = 'Pass SAS variable type and format as CHAR and it will return what sort of variable it is. 
                     Breaks down date/time/dt formats into independent categories.';
            LENGTH FORMAT_CATEGORY $16;
            SELECT;
                WHEN        (MISSING(VAR_Format))                                                   FORMAT_CATEGORY = 'MISSING_FORMAT';
                WHEN        (VAR_Type = 'CHAR')                                                     FORMAT_CATEGORY = 'CHARACTER_FORMAT';
                WHEN        (PRXMATCH("/S370FZDT/", VAR_Format))                                    FORMAT_CATEGORY = 'NUMERIC_FORMAT';
                WHEN        (PRXMATCH("/8601/", VAR_Format))                                        FORMAT_CATEGORY = 'ISO_8601_FORMAT';
                WHEN        (PRXMATCH("/(DT|DATEAMPM|DATETIME)/", VAR_Format))                      FORMAT_CATEGORY = 'DATETIME_FORMAT';
                WHEN        (PRXMATCH("/(HHMM|HOUR|MMSS|TIME|TOD)/", VAR_Format))                   FORMAT_CATEGORY = 'TIME_FORMAT';
                WHEN        (PRXMATCH("/(DAT|DAY|YY|DOW|JUL|MON|QTR|WEEK|YEAR)/", VAR_Format))      FORMAT_CATEGORY = 'DATE_FORMAT';
                WHEN        (VAR_Type = 'NUM')                                                      FORMAT_CATEGORY = 'NUMERIC_FORMAT';
                OTHERWISE                                                                           FORMAT_CATEGORY = 'UNKNOWN';
            END;
            RETURN(FORMAT_CATEGORY);
   ENDSUB;
   FUNCTION Format_Category2(VAR_Format $, VAR_Type $) $ 25
            GROUP = 'Format Category ID'
            LABEL = 'Pass SAS variable type and format as CHAR and it will return what sort of variable it is. 
                     Groups all date/time/dt formats into one category.';
            LENGTH FORMAT_CATEGORY $25;
            SELECT;
                WHEN        (MISSING(VAR_Format))                                                                                                   FORMAT_CATEGORY = 'MISSING_FORMAT';
                WHEN        (VAR_Type = 'CHAR')                                                                                                     FORMAT_CATEGORY = 'CHARACTER_FORMAT';
                WHEN        (PRXMATCH("/S370FZDT/", VAR_Format))                                                                                    FORMAT_CATEGORY = 'NUMERIC_FORMAT';
                WHEN        (PRXMATCH("/(DT|DATEAMPM|DATETIME|8601|HHMM|HOUR|MMSS|TIME|TOD|DAT|DAY|YY|DOW|JUL|MON|QTR|WEEK|YEAR)/", VAR_Format))    FORMAT_CATEGORY = 'DATE_TIME_DATETIME_FORMAT';
                WHEN        (VAR_Type = 'NUM')                                                                                                      FORMAT_CATEGORY = 'NUMERIC_FORMAT';
                OTHERWISE                                                                                                                           FORMAT_CATEGORY = 'UNKNOWN';
            END;
            RETURN(FORMAT_CATEGORY);
   ENDSUB;

 

View solution in original post

6 REPLIES 6
Reeza
Super User

SAS only has two variable types, numeric and character so this isn't a straightforward process.

 

Here's an attempt from about a year and a half ago regarding this problem:

 

https://communities.sas.com/t5/Base-SAS-Programming/Determine-if-a-variable-contains-a-SAS-date-valu...

 

Rather than starting from scratch maybe the solution above can help you get started. 

ballardw
Super User

In addition to looking up format and informat information from the dictionary tables there are the functions VINFORMAT/VFormat or VINFORMATX/ VFORMATX (depending on how you provide the variable) that will return the informat and format directly for a variable.

 

 

Sven111
Pyrite | Level 9

Good point, that'll definitely be worth looking into. There are actually two versions of this code, one that is complete and does basically the same thing but for Oracle tables (using SAS with explicit SQL passthrough) and I'm trying to translate it to look at SAS datasets rather than tables, but since I don't need to interface with Oracle for this, using the dictionary tables may not be best approach for this version.

Quentin
Super User

Yes, particularly in server enviroments with lots and lots of preassigned libraries, dictionary.columns can get huge and slow.  Often I end up using PROC CONTENTS out= to get variable attributes instead.

 

I too have sometimes wanted the ability to guess whether a numeric variable represents a date or a date-time, based on the format attached, but have never put in the effort.  As per the other thread, seems like it shouldn't be that hard to pull the list of SAS-provided formats from dictionary.formats, and build a lookup table or permanent format that maps them to categories. And then would just need to update that every time you move to a new version.

 

And once you've done it, please share. : )

BASUG is hosting free webinars Next up: Jane Eslinger presenting PROC REPORT and the ODS EXCEL destination on Mar 27 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Sven111
Pyrite | Level 9

So I went ahead and created a function using PROC FCMP that does what I wanted, well 2 functions actually, one that returns the four categories (plus missing or unknown), and another that actually breaks up the date, time and datetime formats if that level of specificity is desired (which I thought may come in handy at some point).  They're probably not bullet-proof, but they worked for my purposes and will hopefully be useful to others.  The biggest caveat I'd add is that handling user-defined formats will be touch and go since it's simply parsing the SAS format which is passed as a character variable, so if the name overlaps with the regular expressions I'm using, all bets are off.

 

PROC FCMP OUTLIB=LIB_X.FCMP_FUNCS.Format_Category_ID;
   FUNCTION Format_Category(VAR_Format $, VAR_Type $) $ 16
            GROUP = 'Format Category ID'
            LABEL = 'Pass SAS variable type and format as CHAR and it will return what sort of variable it is. 
                     Breaks down date/time/dt formats into independent categories.';
            LENGTH FORMAT_CATEGORY $16;
            SELECT;
                WHEN        (MISSING(VAR_Format))                                                   FORMAT_CATEGORY = 'MISSING_FORMAT';
                WHEN        (VAR_Type = 'CHAR')                                                     FORMAT_CATEGORY = 'CHARACTER_FORMAT';
                WHEN        (PRXMATCH("/S370FZDT/", VAR_Format))                                    FORMAT_CATEGORY = 'NUMERIC_FORMAT';
                WHEN        (PRXMATCH("/8601/", VAR_Format))                                        FORMAT_CATEGORY = 'ISO_8601_FORMAT';
                WHEN        (PRXMATCH("/(DT|DATEAMPM|DATETIME)/", VAR_Format))                      FORMAT_CATEGORY = 'DATETIME_FORMAT';
                WHEN        (PRXMATCH("/(HHMM|HOUR|MMSS|TIME|TOD)/", VAR_Format))                   FORMAT_CATEGORY = 'TIME_FORMAT';
                WHEN        (PRXMATCH("/(DAT|DAY|YY|DOW|JUL|MON|QTR|WEEK|YEAR)/", VAR_Format))      FORMAT_CATEGORY = 'DATE_FORMAT';
                WHEN        (VAR_Type = 'NUM')                                                      FORMAT_CATEGORY = 'NUMERIC_FORMAT';
                OTHERWISE                                                                           FORMAT_CATEGORY = 'UNKNOWN';
            END;
            RETURN(FORMAT_CATEGORY);
   ENDSUB;
   FUNCTION Format_Category2(VAR_Format $, VAR_Type $) $ 25
            GROUP = 'Format Category ID'
            LABEL = 'Pass SAS variable type and format as CHAR and it will return what sort of variable it is. 
                     Groups all date/time/dt formats into one category.';
            LENGTH FORMAT_CATEGORY $25;
            SELECT;
                WHEN        (MISSING(VAR_Format))                                                                                                   FORMAT_CATEGORY = 'MISSING_FORMAT';
                WHEN        (VAR_Type = 'CHAR')                                                                                                     FORMAT_CATEGORY = 'CHARACTER_FORMAT';
                WHEN        (PRXMATCH("/S370FZDT/", VAR_Format))                                                                                    FORMAT_CATEGORY = 'NUMERIC_FORMAT';
                WHEN        (PRXMATCH("/(DT|DATEAMPM|DATETIME|8601|HHMM|HOUR|MMSS|TIME|TOD|DAT|DAY|YY|DOW|JUL|MON|QTR|WEEK|YEAR)/", VAR_Format))    FORMAT_CATEGORY = 'DATE_TIME_DATETIME_FORMAT';
                WHEN        (VAR_Type = 'NUM')                                                                                                      FORMAT_CATEGORY = 'NUMERIC_FORMAT';
                OTHERWISE                                                                                                                           FORMAT_CATEGORY = 'UNKNOWN';
            END;
            RETURN(FORMAT_CATEGORY);
   ENDSUB;

 

ballardw
Super User

If you use the variable information functions remember that you only need to look at one record to have the characteristics of all the variables so use dataset option obs=1.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 936 views
  • 8 likes
  • 4 in conversation