BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

I have DATETIME16. variable and I had datepart and year from it just fine. However, some years were output in incorrect numbers, 3000, 2054 and 2099 from same data step. Please help rectify this problem. Thanks in advance!

 

2005 1147557
2006 1582890
2007 1723150
2008 1889702
2009 2118443
2010 2436944
2011 2389669
2012 2233374
2013 2466187
2014 2558208
2015 2321677
2016 1918775
2017 1023156
2054 1
2099 1
3000

2677

data mydata1; set mydata
ser_date = datepart(dt);
format ser_date date9.;
ser_year=year(ser_date);
run; 

proc freq data=mydata1;
tables ser_year;
run;

data test; set mydata1; 
if ser_year in (2054,2099,3000) then output;
run; 

Datepart and year() functions worked fineDatepart and year() functions worked fine

 

The part where year is output 3000 instead 2000The part where year is output 3000 instead 2000

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Last time and why we keep asking about the file that was read.

Here is an example of some possibly odd datetime values read with anydtdtm format.

When you can explain how the result is derived for each one of these you might be ready to debug your original data that was read with that informat:

data junk;
  infile datalines truncover;
  input x anydtdtm40.;
  format x datetime20.;
datalines;
20.03.20.04.00.00
20.03.20.54.00.00
20.03.20.1054.00.00
20.03.20.154.00.00
20.03.2018.154.00.00
;
run;

And so everyone doesn't have to actually run the code:

 

                   x

  20MAR2020:04:00:00
  22MAR2020:06:00:00
  02MAY2020:22:00:00
  26MAR2020:10:00:00
  26MAR2018:10:00:00

Moral of the story: missing or extra characters in your source file can change the results in some pretty odd fashions.

 

So find the records in the raw data and show the text of the original values read with the informat and we can probably suggest either a fix or a change to the informat.

 

If the original file was using proc import and that assigned the in anydtdtm informat  and the data is supposed to have a consistent layout then this is an indication that SAS may have found something that did not match your expectations and grabbed that informat instead of a basic datetime informat because a number of values had odd characters or different lengths to be read.

View solution in original post

18 REPLIES 18
Reeza
Super User

Since you used ANYDTDTM as the informat so I suspect the dates were read incorrectly. 

 

What is the underlying value for the incorrect dates, without any format. Its the number of seconds from Jan1 1960, so you can verify the ranges with some math to see if the underlying data is correct. I suspect it's not. 

Cruise
Ammonite | Level 13
@Reeza, data is given as is. I have no way of knowing underlying value. Looking at the screenshot 01JAN00:00:00:00, I'm tempted to take it as 01, Jan 2000. Would you agree?
Reeza
Super User

@Cruise wrote:
@Reeza, data is given as is. I have no way of knowing underlying value. Looking at the screenshot 01JAN00:00:00:00, I'm tempted to take it as 01, Jan 2000. Would you agree?


What is the original data source? 

A text or Excel file? 

Reeza
Super User

You have a data integrity problem. You cannot assume that the date is 2000 safely, so you need to determine what the original data should be. An example of why it may be very wrong - there's often a trick used to avoid having to deal with missing data - where you set the year to an arbitrarily large number to allow you to avoid having to explicitly deal with the case where it's missing. 

 

If the dates with 3000 or missing years are identical - which from the example shown they appear to be - I suspect something like that may be the issue. 

 

It's also very unlikely a final stored SAS7BDAT file would have an informat of ANYDTDTM, I've usually only see that when PROC IMPORT was used to import data. 

 


@Cruise wrote:
SAS7dbat

 

ballardw
Super User

@Cruise wrote:
@Reeza, data is given as is. I have no way of knowing underlying value. Looking at the screenshot 01JAN00:00:00:00, I'm tempted to take it as 01, Jan 2000. Would you agree?

Absolutely not. Each datetime width has different rules for display. When you use datetime16. it means the year will be displayed as 2 digits, and the last 2 at that.

An example with two different centuries that display with the same "year" and the impact of the specific format chosen.

 

data example;
   x = '10Aug2005:00:00:00'dt;
   y = '10Aug2105:00:00:00'dt;
   format x y datetime16.;
run;
proc print;
run;

proc print;
  format x y datetime20.;
run;
ChrisNZ
Tourmaline | Level 20

It looks like you have bad data.

It may be because the anydtdte. informat misinterpreted some input values.

Run this to create an unformatted variable.

data test; 
  set mydata1; 
  X=DT;
  if ser_year in (2054,2099,3000) then output;
run;

 

Cruise
Ammonite | Level 13
Hi Chris, how does it help? Idata "test" has X and DT missing all the way and ser_year as numeric taking stated values.
ChrisNZ
Tourmaline | Level 20

1. How can you have DT missing when you use it so:

ser_date = datepart(dt);

2. It helps because we can see the actual unformatted value.

Cruise
Ammonite | Level 13

@ChrisNZcheck this out Chris. Any idea? let me know please

 

ser_date = datepart(dt);
format ser_date date9.;
ser_year=year(ser_date);
X=ser_date;
if ser_year in (2054,2099,3000) then output;

 

X=ser_date;   if ser_year in (2054,2099,3000) then output;X=ser_date; if ser_year in (2054,2099,3000) then output;

ballardw
Super User

Show something from the original text file or data source. That is where the "problem" almost certainly comes from.

Preferably identify the records with this odd behavior and find those to display.

 

Reeza
Super User

Check the records with the wrong years. If you only have 4 unique dates, there's something else going on, which appears to be the case.

Reeza
Super User

For all the records with the years that seem weird, run a proc freq and see how many unique dates you have. If it's only 3 or 4 unique dates causing issues with the years, I suspect it's not the correct dates, but someone doing something weird with the data like I originally mentioned. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 2668 views
  • 7 likes
  • 5 in conversation