BookmarkSubscribeRSS Feed
elwayfan446
Barite | Level 11

Hello.

 

I have an interesting one.  I am importing a file with a date field that has dates in the future.  Some of them are importing correctly (2038) and some of them are not (1952).  These are being import from a CSV file.  I can't tell why this is happening or how to fix it.  Can anyone tell me what is going on?

 

CSV Date

02/01/53
03/01/53
02/01/38
03/01/53
03/01/38
03/01/53
02/01/53
02/01/53
02/01/38
8 REPLIES 8
PaigeMiller
Diamond | Level 26

Set the YEARCUTOFF option to something that might work on your data.

 

Or better yet, use four digit years, and insist that whoever provides you with the data also uses four digit years.

--
Paige Miller
SASKiwi
PROC Star

You can check what your YEARCUTOFF setting is by running this program step:

proc options option = yearcutoff;
run;
ballardw
Super User

Welcome to yet another reason to never use two digit years. Really. Talk to the source of the data to see if they can provide years as 4-digit values (and not 0053, that is treated the same as 53)

 

Your can run this code to see what your current setting of the YEARCUTOFF option is:

proc options option=yearcutoff;
run;

In my case the value shown in the Log is 1926. That means that any two-digit year less than 26 is interpreted as in century 2000 and after are in century 1900. So a value of 27 is year 1927. If your 2038 is correct then your Yearcutoff is at least 1939.

To get 53 to import you would have to set the Yearcutoff to at at least 1953. If you have later years adjust.

 

If your data includes other dates that are also in 2-digit form, such as Date of Birth or a more current activity, such as a purchase, you may well have to write code to address this issue because the cutoff is session specific and very likely that your other dates may have problems.

elwayfan446
Barite | Level 11

Thanks guys.  This was the problem.  Mine is set at 1940 so that makes sense.  Can I change that with a script or is that a server side change (please say it's a script).  Getting the actual data file changed by the client will also take an act of Congress.

ballardw
Super User

@elwayfan446 wrote:

Thanks guys.  This was the problem.  Mine is set at 1940 so that makes sense.  Can I change that with a script or is that a server side change (please say it's a script).  Getting the actual data file changed by the client will also take an act of Congress.


Charge them for the programming time. Do that enough times and they may find it cheaper than your rates. I did that with one client that provided data files where column orders, column names and content structure changed weekly. When they asked about why we were charging them thousands of dollars a month for programming on what should have been static files we explained and provided examples of the files we actually received. Shortly the file contents quit change.

 

The OPTIONS statement should work:

 

options yearcutoff=1955;

or what ever year makes sense. I would look closely at any other other date variables if any to verify they behave correctly.

elwayfan446
Barite | Level 11

The options statement worked, thank you.

 

My follow up question to this.  Is there an easy way to updated dates that were already in my dataset incorrectly because of this to the correct date?  Also, I will be checking all of my dates going forward.

ballardw
Super User

@elwayfan446 wrote:

The options statement worked, thank you.

 

My follow up question to this.  Is there an easy way to updated dates that were already in my dataset incorrectly because of this to the correct date?  Also, I will be checking all of my dates going forward.


IF you know that any years in the variable that appear as 19xx are supposed to be 20xx a data step like this should work.

Note: I am showing making a new data set with an additional variable so you can compare results. I, however, do not know what your specific year range might be though am guessing.

data test;
   set have;
   if 1940 le year(date) le 1960 then newdate= intnx('year',date,100,'S');
else newdate=date;
format newdate date9.; run;

The if is checking for "incorrect" years of 1940 to 1960. Then the actual year of the date is in that range it creates a new variable with the same day of year, the 'S' parameter, using the INTNX function to increment the date value by adding 100 years. Otherwise the newdate is the same as the current date variable.

Kurt_Bremser
Super User

There is an issue with using YEARCUTOFF: when you have several dates in one file spanning over different ranges, it will fail. Consider a file containing life insurance information, past and present. You will find birthdates ranging from 192x to 200x, contract start dates from, say, the 1960's to the present, and contract end dates from, say, the 1970's to the 2050's.

 

Bottom line: anybody delivering 2-digit years, especially after Y2K, needs to have their heads examined for a missing brain.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 645 views
  • 2 likes
  • 5 in conversation