- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have several challenges with dates.
1) I am reading data from Excel.
All but one columns reads automatically as normal dd/mm/yyyy i.e. MMDDYY10. according to proc contents.
However, one column originally had mixture of dates in format 12/31/2020 and TBD. As a result date comes as Char 5. And all dates are automatically converted to some kind on numbers like 44196 (instead of 12/31/2020).
I fist replaced TBD with space using proq sql and translate(BadDate, ' ', 'TBD') as Date.
Now I have column full of 5 digits instead of regular dates (like 44196) or empty cells. I need to convert these 5 digit numbers to normal dates.
I tried NewDate=input(Date,MMDDYY10.) and get column full of "." instead of dates.
Any suggestions? What am I doing wrong? Goal is to have in column normal dates or empty cells.
2)Is there a way to compare dates in SAS. For example, if I need to compare dates in two columns and keep latest can I use normal comparison or should I convert all dates to SAS dates first?
3) Similarly, if I need to remove all records with dates that expired 3 months ago, is there a simple solution like read current date, calculate 90 dates determine cutoff date.
Thanks.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
NewDate=input(Date,best12.);
NewDate=NewDate -365*60 -16; /*I have not idea why, but that gave me correct original dates; */
format NewDate mmddyy10.;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You could try this:
NewDate=input(Date,best12.);
format NewDate mmddyy10.;
Your character variable is no longer in date format but probably the number of days since 1 Jan 1960. Converting that to a number and applying a date format should fix your problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It was read by Excel into SAS as4419644561443474434744439
After using suggested code:
NewDate=input(Date,best12.);
format NewDate mmddyy10.;
My output is:01/01/208101/01/208206/01/208109/01/2081
I am off by 1 day and 60 years :). Should I subtract something prior to conversion?Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It looks that I need to subtract 365*60+1. Checking...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You need to adjust for difference in where SAS and Excel start counting (also because Excel treats 1900 as a leap year).
Note their is no "best" informat, just use regular informat.
NewDate=input(Date,32.) + '30DEC1899'd;
format NewDate mmddyy10.;
PS If you are sharing this data with other you might want to use a format that does not use either MDY or DMY ordering of date parts as either choice will confuse half of you audience. I like to use YYMMDD10. or DATE9.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks!!! That worked too.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
NewDate=input(Date,best12.);
NewDate=NewDate -365*60 -16; /*I have not idea why, but that gave me correct original dates; */
format NewDate mmddyy10.;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It works because that is the value SAS uses to store 2 days before start of 1900. SAS counts from 1960 and Excel counts from 1900. They use different choices for whether to count from 0 or 1 and Excel treats 1900 has a leap year.
596 data test; 597 x = -365*60 -16; 598 y = '30DEC1899'd ; 599 put (x y) (= comma12. / ); 600 run; x=-21,916 y=-21,916
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Excel starts (nominally) at 1900-01-01, but it starts with day 1 (1960-01-01 is day zero in SAS), and mistakes 1900 for a leap year.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The best method is this:
- save the data to a textual format, like CSV
- create a custom informat that reads 'TBD' as missing, and other with DDMMYY10.
- use this informat in a DATA step that reads the text file
With your current data, you need to to this:
if chardate = 'TBD'
then numdate = .;
else numdate = input(chardate,5.) + '30dec1899'd;
format numdate yymmdd10.;
The addition of 30dec1899 corrects the different zero day of Excel vs. SAS.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot to everyone who replied!
If some one has idea how to answer items 2 and 3 from original post, it would be greatly appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
1) I am reading data from Excel.
All but one columns reads automatically as normal dd/mm/yyyy i.e. MMDDYY10. according to proc contents.
However, one column originally had mixture of dates in format 12/31/2020 and TBD. As a result date comes as Char 5. And all dates are automatically converted to some kind on numbers like 44196 (instead of 12/31/2020).
I fist replaced TBD with space using proq sql and translate(BadDate, ' ', 'TBD') as Date.
Now I have column full of 5 digits instead of regular dates (like 44196) or empty cells. I need to convert these 5 digit numbers to normal dates.
You do not say how you are reading the values for Excel. This sort of smells like using Proc Import. Which often results in sub-optimal results because Excel imposes absolutely NO restrictions on what goes into any single cell and so results can be pretty poor.
If you save the file as a CSV and use a data step to read the resulting text file you have much more control over the result and you can use custom informats to read problematic data such as you show.
Example:
proc format; invalue tbddate (upcase) 'TBD' = . other = [mmddyy10.] ; data example; input x :tbddate.; format x mmddyy10.; datalines; TBD tbd tBD 01/01/2021 10/12/1995 ;
This approach has several advantages. First, your code to remove 'TBD' only works if it always all upper case. The informat as defined with the (upcase) option converts all text to uppercase before comparsion so would get any of 3-letter versions correctly assigned to missing. Second if you wanted to assign a specific date value for some reason other than missing you could do that as: 'TBD' = '31DEC2099'd for example.
Another is if at a latter date you have multiple date columns with this headache you can use the same format to read all of them.
2) SAS date values are numeric. So if you need the "latest" that would be the largest of a group of values and the MAX function would work: LatestDate = max(date1, date2, newdate);
3) Similarly, if I need to remove all records with dates that expired 3 months ago, is there a simple solution like read current date, calculate 90 dates determine cutoff date.
The function TODAY() returns the current date. You could use the INTCK function to get the number of intervals specified. The question becomes how you define "3 months ago".
The basic approach would be
IF intck('month',date,Today()) ge 3 then delete.
You specifically asked for month so that is the interval I show. You can ask for 'year', 'week', 'day', time intervals and if the variables are datetime values ask for 'dtyear' 'dtmonth' 'dtweek' 'dtday' to use the part of the algorithm to return the intervals correctly. There are also offsets available.
The details can come with exact values you need for a specific purpose. Sometimes "3 months" might depend on the day of the month. So the INTCK function has an additional parameter that works with where the boundaries may fall. Or you can shift either of dates using the INTNX function as well.
The basic Intck as shown would return 1 for the interval between (today as) 1May2021 and 30Apr2021. It would also return 1 for 1Apr2021.