06-07-2018 03:23 PM
I have DATETIME16. variable and I had datepart and year from it just fine. However, some years were output in incorrect numbers, 3000, 2054 and 2099 from same data step. Please help rectify this problem. Thanks in advance!
data mydata1; set mydata ser_date = datepart(dt); format ser_date date9.; ser_year=year(ser_date); run; proc freq data=mydata1; tables ser_year; run; data test; set mydata1; if ser_year in (2054,2099,3000) then output; run;
06-08-2018 07:20 PM
Last time and why we keep asking about the file that was read.
Here is an example of some possibly odd datetime values read with anydtdtm format.
When you can explain how the result is derived for each one of these you might be ready to debug your original data that was read with that informat:
data junk; infile datalines truncover; input x anydtdtm40.; format x datetime20.; datalines; 20.03.20.04.00.00 20.03.20.54.00.00 20.03.20.1054.00.00 20.03.20.154.00.00 20.03.2018.154.00.00 ; run;
And so everyone doesn't have to actually run the code:
x 20MAR2020:04:00:00 22MAR2020:06:00:00 02MAY2020:22:00:00 26MAR2020:10:00:00 26MAR2018:10:00:00
Moral of the story: missing or extra characters in your source file can change the results in some pretty odd fashions.
So find the records in the raw data and show the text of the original values read with the informat and we can probably suggest either a fix or a change to the informat.
If the original file was using proc import and that assigned the in anydtdtm informat and the data is supposed to have a consistent layout then this is an indication that SAS may have found something that did not match your expectations and grabbed that informat instead of a basic datetime informat because a number of values had odd characters or different lengths to be read.
06-07-2018 03:31 PM
Since you used ANYDTDTM as the informat so I suspect the dates were read incorrectly.
What is the underlying value for the incorrect dates, without any format. Its the number of seconds from Jan1 1960, so you can verify the ranges with some math to see if the underlying data is correct. I suspect it's not.
06-07-2018 03:54 PM
06-07-2018 05:06 PM
06-07-2018 10:57 PM
You have a data integrity problem. You cannot assume that the date is 2000 safely, so you need to determine what the original data should be. An example of why it may be very wrong - there's often a trick used to avoid having to deal with missing data - where you set the year to an arbitrarily large number to allow you to avoid having to explicitly deal with the case where it's missing.
If the dates with 3000 or missing years are identical - which from the example shown they appear to be - I suspect something like that may be the issue.
It's also very unlikely a final stored SAS7BDAT file would have an informat of ANYDTDTM, I've usually only see that when PROC IMPORT was used to import data.
06-07-2018 05:13 PM
@Reeza, data is given as is. I have no way of knowing underlying value. Looking at the screenshot 01JAN00:00:00:00, I'm tempted to take it as 01, Jan 2000. Would you agree?
Absolutely not. Each datetime width has different rules for display. When you use datetime16. it means the year will be displayed as 2 digits, and the last 2 at that.
An example with two different centuries that display with the same "year" and the impact of the specific format chosen.
data example; x = '10Aug2005:00:00:00'dt; y = '10Aug2105:00:00:00'dt; format x y datetime16.; run; proc print; run; proc print; format x y datetime20.; run;
06-07-2018 10:36 PM - edited 06-07-2018 10:37 PM
It looks like you have bad data.
It may be because the anydtdte. informat misinterpreted some input values.
Run this to create an unformatted variable.
data test; set mydata1; X=DT; if ser_year in (2054,2099,3000) then output; run;
06-08-2018 07:57 AM
06-08-2018 08:12 AM
1. How can you have DT missing when you use it so:
ser_date = datepart(dt);
2. It helps because we can see the actual unformatted value.
06-08-2018 08:34 AM - edited 06-08-2018 08:36 AM
@ChrisNZcheck this out Chris. Any idea? let me know please
ser_date = datepart(dt); format ser_date date9.; ser_year=year(ser_date); X=ser_date; if ser_year in (2054,2099,3000) then output;
06-08-2018 11:27 AM
Show something from the original text file or data source. That is where the "problem" almost certainly comes from.
Preferably identify the records with this odd behavior and find those to display.
06-08-2018 11:35 AM
Check the records with the wrong years. If you only have 4 unique dates, there's something else going on, which appears to be the case.
06-08-2018 11:52 AM
For all the records with the years that seem weird, run a proc freq and see how many unique dates you have. If it's only 3 or 4 unique dates causing issues with the years, I suspect it's not the correct dates, but someone doing something weird with the data like I originally mentioned.
Need further help from the community? Please ask a new question.