I have a column that is Text and I need to break up the dates in seperate Month, Day, Year columns. My only problem is that the data looks like this
5/ -Month column = 5 and Day and Year will be 99
11/ -Month column = 11 and Day and Year will be 99
1992 -Year column = 1992 and Month and Day will be 99
Nov-64 -Month column = 11 and Year column = 1964 and Day will be 99.
99 is just a default value
How can I break up each of those into the correct columns?
Your post is somewhat confusing, mostly with attempts to describe the data-content to be converted. May I suggest you simply list the data-column lines with no interpretation, to start?
What you will want to do is use a SAS DATA step, read up your data line as text into a SAS character variable, attempt to interrogate the data-format with code:
IF THEN ;
The SAS INPUT function will be your friend -- used to convert the character-format data to a SAS DATE variable. Then, if you need to have separate SAS variables (columns) with the YEAR, MONTH and DAY represented as integer values, you will want to use the YEAR, MONTH and DAY function.
Suggest you start by reading SAS documentation on this topic - some references I found at the SAS support http://support.sas.com/ website (using the SEARCH facility) are listed below:
Thank you for the reply and sorry about the confusion.
This is a sample of data from 1 column in my data table.
What I did was exactly what you suggested. I created a data step with the length of each rows data. Then I created another data step that had if conditions that checked if length is 1 do this and if it's 2 do this and so on. I was trying to do it in clearner way but I don't think I have any choice but to address each varition.
You might find the MDY function beneficial when working with valid integers (converted from your text string using INPUT and the appropriate INFORMAT). And if you want to derive a DATETIME variable (for extra-credit), use a combination of DHMS and inside it MDY to get SAS to yield a numeric (internal) value representing "seconds since 1/1/1960".
Keep in mind that you can concatenate literal strings with your "variable" input data record fields or sub-fields (using either the SCAN or INDEX / FIND functions) when parsing your data. For example if you have a data-condition where only a year value is known, you may have an assignment statement (note that there is one year-value string with an imbedded alpha O character):
LENGTH DATESTRING $50; /* ARBITRARY LENGTH */
INPUT DATESTRING $;
FORMAT MYDATE YYMMDD10. ; /* YOU DECIDE OUTPUT FMT */
* LOGIC HERE TO TEST FOR YEAR PORTION ONLY ;
IF LENGTH(DATESTRING) IN (2,4) AND ANYALPHA(DATESTRING) = 0 THEN
MYDATE = INPUT("1/1/" !! DATESTRING,ANYDTDTE.);
/* ELSE IF THEN ; */ ;
IF MYDATE NE . THEN PUTLOG ">INFO> TRANSLATED OKAY: " MYDATE=;
ELSE PUTLOG ">ERROR> TRANSLATED BAD - INPUT DATA FOLLOWS: " /
That example works good but i'm not sure it will work in my case.
I have lengths from 1-10 with all different layouts for example
There are so many variations of the data I would still have to do if statements for each case correct? Sometimes I have 1 or 2 digit months and days and year is either 2,3 or 4 characters.
Given the example data shown, my recommendation is to parse the string (as a single SAS character variable) using the SAS SCAN function, into MONTH, DAY and YEAR (as individual SAS numeric variables).
Here's a tip: if you declare your SAS numeric variables ahead of the SCAN assignment statement, you will get SAS to automatically convert them from character to numeric. Then use the MDY function to assign your SAS numeric DATE variable.
There is another approach where you could use the INPUT function with the various INFORMATs, but I would recommend against using the "too flexible for my palate" ANYDTDTE INFORMAT. And be sure that you test your values for reasonableness (either with PROC PRINT or PUTLOG to the SASLOG) to ensure that the outputs are desired relative to each date input string.