I am struggling to understand why a function with the same start and end dates is returning different values?
YEARS_ON_JOB_c and YEARS2 have the same definition, the first being formatted as 10.2 and latter having no format
YRDIF(start_date, DATE_OF_DATA, 'act/act')
Current version: 9.04.01M7P080520
Hello @GBL__,
I was able to reproduce your output (including all decimals shown) with my SAS 9.4M5 using the code below.
proc format;
value dfmt
22585-22949='10/31/2022';
run;
data test;
_=_n_+14058;
input (start_date DATE_OF_DATA)(:yymmdd.);
YEARS_ON_JOB_c = yrdif(start_date, DATE_OF_DATA, 'act/act');
YEARS2 = yrdif(start_date, DATE_OF_DATA, 'act/act');
format start_date mmddyy10. DATE_OF_DATA dfmt. YEARS_ON_JOB_c 10.2;
cards;
2020-11-01 2022-07-07
2020-11-01 2022-07-13
2020-11-01 2022-07-19
2020-10-31 2022-05-31
2020-10-31 2022-05-17
2020-10-31 2022-10-10
2020-10-31 2022-10-31
2020-10-31 2022-09-14
2020-10-31 2022-08-24
2020-10-30 2022-10-17
2020-10-30 2022-10-21
2020-10-30 2022-09-29
2020-10-30 2022-10-03
2020-10-30 2022-05-04
;
proc print data=test noobs;
format years2 best12.;
run;
I think it is more likely that the YRDIF results were obtained with the dates shown in the data lines above than with other suitable dates, which in turn is much more likely than that they were obtained with a constant DATE_OF_DATA='31OCT2022'd and start_date='01NOV2020'd, etc.
As a first step I would print those observations using
format DATE_OF_DATA yymmdd10.;
in the PROC PRINT step to avoid the often confusing effect of unexpected formats (such as dfmt. above).
If this constantly showed DATE_OF_DATA as 2022-10-31, I would scrutinize the log containing the YRDIF function calls to check if the arguments were really start_date and DATE_OF_DATA and if there is any chance that either of these variables had different values (than it has now) when the year differences were computed.
The final examination, if needed, would be to run more and more simplified versions of the code on the input dataset(s), restricted to a few suitable observations. If the incorrect results persist, please show us that simplified code and those few observations (in the form of a data step).
We can't test SAS code against pictures, so posting pictures of data is useless and only a waste of time.
Post data as SAS code, in data steps with datalines which recreate your data.
And, whenever code does not work as expected, post the complete log of the step.
Use the "little running man" for SAS code, and the </> button for logs and text data.
Thanks, Kurt, for your most helpful response.
Let me rephrase my question, Why would the YRDIF function ever return two different values for the same date span?
Thanks for wasting both of ours time as well with your amazingly informative response
@GBL__ wrote:
Thanks, Kurt, for your most helpful response.
Let me rephrase my question, Why would the YRDIF function ever return two different values for the same date span?
Thanks for wasting both of ours time as well with your amazingly informative response
You still didn't post the actual values. As the last two column in your picture show you really cannot depend on how the values LOOK to know what the value actually is.
Hello @GBL__,
I was able to reproduce your output (including all decimals shown) with my SAS 9.4M5 using the code below.
proc format;
value dfmt
22585-22949='10/31/2022';
run;
data test;
_=_n_+14058;
input (start_date DATE_OF_DATA)(:yymmdd.);
YEARS_ON_JOB_c = yrdif(start_date, DATE_OF_DATA, 'act/act');
YEARS2 = yrdif(start_date, DATE_OF_DATA, 'act/act');
format start_date mmddyy10. DATE_OF_DATA dfmt. YEARS_ON_JOB_c 10.2;
cards;
2020-11-01 2022-07-07
2020-11-01 2022-07-13
2020-11-01 2022-07-19
2020-10-31 2022-05-31
2020-10-31 2022-05-17
2020-10-31 2022-10-10
2020-10-31 2022-10-31
2020-10-31 2022-09-14
2020-10-31 2022-08-24
2020-10-30 2022-10-17
2020-10-30 2022-10-21
2020-10-30 2022-09-29
2020-10-30 2022-10-03
2020-10-30 2022-05-04
;
proc print data=test noobs;
format years2 best12.;
run;
I think it is more likely that the YRDIF results were obtained with the dates shown in the data lines above than with other suitable dates, which in turn is much more likely than that they were obtained with a constant DATE_OF_DATA='31OCT2022'd and start_date='01NOV2020'd, etc.
As a first step I would print those observations using
format DATE_OF_DATA yymmdd10.;
in the PROC PRINT step to avoid the often confusing effect of unexpected formats (such as dfmt. above).
If this constantly showed DATE_OF_DATA as 2022-10-31, I would scrutinize the log containing the YRDIF function calls to check if the arguments were really start_date and DATE_OF_DATA and if there is any chance that either of these variables had different values (than it has now) when the year differences were computed.
The final examination, if needed, would be to run more and more simplified versions of the code on the input dataset(s), restricted to a few suitable observations. If the incorrect results persist, please show us that simplified code and those few observations (in the form of a data step).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.