Hello everyone,
I am working with a dataset and carried out difference in recorded dates using the intck function as below:
dif = intck ('day’, startdate, enddate, 'DISCRETE');
The sample result is shown below:
dif | frequency | percent |
-6 | 18 | 0.03 |
-5 | 15 | 0.03 |
-4 | 20 | 0.04 |
-3 | 26 | 0.05 |
-2 | 28 | 0.05 |
-1 | 87 | 0.16 |
0 | 2078 | 3.85 |
1 | 4204 | 7.8 |
2 | 3690 | 6.84 |
3 | 3097 | 5.74 |
4 | 2707 | 5.02 |
5 | 2207 | 4.09 |
How do I interpret the negative results? I was not expecting negative values since startdate was placed before enddate in the syntax. Can it be an issue with how the dates were formatted? If so, how do I fix the issue regarding negative values since I intend to find average value for dif variable? An example of how startdate is formatted is 01JAN1960 while an example of how enddate is formatted is 04/06/2019. I will appreciate any suggestions/recommendations on how to make the code work. Thank you.
Dif is only negative for a small number of rows in your data. If appears you have a small data quality problem where around 50 rows have an ENDDATE that is earlier than STARTDATE. View those rows to understand what might be causing this. It's possible that they are simply errors but without understanding how you got your data it is hard to advise further. You could just remove those rows if they are genuinely bad data.
Formats shouldn't matter as long as both are date types. Both have a date format?
1960 versus 2019 makes me wonder if one has a datetime format versus a date?
enddate is formatted as MMDDYY10. and also has informat DATETIME22.3 noted
Please look at the values of enddate with your own eyes. Do they look like dates around 1960? Do they look like the dates you are expecting?
A variable with a datetime infirmat and a date format doesn't look right to me. How was your data imported into SAS?
This is not a format problem. SAS always uses the un-formatted values to do this math, and so whatever formats are used are irrelevant. The interpretation is that some of your enddate values are before the corresponding startdate values.
The FORMAT attached to a variable are just the instructions for how to convert the value to text strings for printing.
What you need to be concerned about is the types of VALUES that the variables contain. If the values look right with displayed with the DATE or MMDDYY (or DDMMYY) format then they contain DATE values, which are number of days since 1960.
In which case you can just use subtraction to get the number of days between two dates.
If you are seeing negative differences then the ENDDATE is BEFORE the STARTDATE.
Perhaps the different display format are confusing the humans that are trying to check if the date values were entered properly? That is especially true if you are displaying values like 04/06/2019. Is that April Sixth or the Fourth of June?
dif | Frequency | Percent |
-6 | 2 | 0 |
-5 | 2 | 0 |
-4 | 2 | 0 |
-3 | 4 | 0.01 |
-2 | 12 | 0.02 |
-1 | 33 | 0.06 |
0 | 22203 | 41.17 |
1 | 13164 | 24.41 |
2 | 3744 | 6.94 |
3 | 2905 | 5.39 |
4 | 1947 | 3.61 |
Dif is only negative for a small number of rows in your data. If appears you have a small data quality problem where around 50 rows have an ENDDATE that is earlier than STARTDATE. View those rows to understand what might be causing this. It's possible that they are simply errors but without understanding how you got your data it is hard to advise further. You could just remove those rows if they are genuinely bad data.
I merged your posts dealing with the same issue.
@AyoSho SAS has a concept of SAS Date values and SAS DateTime values.
SAS Date values are the count of days since 1/1/1960
SAS DateTime values are the count of seconds since 1/1/1960
Both of these values are stored in a SAS numerical data type. SAS Formats just make these number human readable but they don't have any effect on calculations/logic which uses the internal values (the counts).
A SAS INFORMAT is used to convert an external value - a text string - to the appropriate value for storing in a SAS variable.
What feels wrong: Informat DATETIME is used to convert a date string with a time component to a SAS DateTime value (=count of seconds) but format MMDDYY10. is used to make a SAS Date value (count of days) human readable. If you apply MMDDYY10. on a SAS DateTime value then you should see a line of ************ when printing.
And for any calculations/comparisons like with intck(): If you mix variable with Date and DateTime values then you certainly won't get correct results. You can convert a SAS variable that stores a SAS DateTime value to a SAS Date value using the DATEPART() function.
Hope these explanations will help you to investigate and resolve the challenge you're facing.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.