- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I have a simple question I think, yet I just can't seem to find the answer when I try to google it (I'm sure it's because I just can't find the correct search terms).
I'm currently trying to calculate the difference in years within a group. Example of my current data:
Group Date
A 1990
A 1991
A 1992
B 2001
B 2002
I've noticed that, if I calculate the difference in years, I get the difference in days (I think). For example, if the difference is 2002 - 2001, then I get 365 as a result. What I would like is to get the difference in years, so 2002 - 2001 = 1.
This is the code I currently have:
proc sql;
create table work.test_date_diff as
select (max(year) - min(year)) as diffyear format year4.
from work.test_year
group by group;
quit;
As such, I was wondering just what it is that I'm missing.
I hope someone can tell me what I'm doing wrong or link me to an example!
Many thanks in advance!
Jian
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Are your date variable value in years?
If yes, are you after this?
Plus I am not sure why do you want to FORMAT the difference in years with a YEAR4 format that is used to display year component of a SAS date
data have;
input group $ date;
cards;
A 1990
A 1991
A 1992
B 2001
B 2002
;
data want;
set have;
by group;
dif_years=dif(date);
if first.group then call missing(dif_years);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Are your date variable value in years?
If yes, are you after this?
Plus I am not sure why do you want to FORMAT the difference in years with a YEAR4 format that is used to display year component of a SAS date
data have;
input group $ date;
cards;
A 1990
A 1991
A 1992
B 2001
B 2002
;
data want;
set have;
by group;
dif_years=dif(date);
if first.group then call missing(dif_years);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello again @JianShen I have a feeling the simple correction in your code would be is to remove the format year4.
proc sql;
create table work.test_date_diff as
select (max(year) - min(year)) as diffyear
from work.test_year
group by group;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @JianShen
Assuming the variable YEAR is a SAS date, you can use the INTCK function:
proc sql;
create table work.test_date_diff as
select intck('year',min(year),max(year)) as diffyear
from work.test_year
group by group;
quit;
If it is just a numeric value (not a SAS date), the solution with dif(date) should work.
- Tags:
- intck
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you have date values (as it appears), use the intck() function:
intck('year',min(year),max(year)) as diffyear
Full code with example data:
data have;
input year :yymmdd10.;
format year year4.;
datalines;
2019-12-01
2018-10-13
2005-05-03
;
proc sql;
create table want as
select intck('year',min(year),max(year)) as diffyear
from have;
quit;
If that does not do it, we need to know what your data looks like. Post a data step with datalines that reflects your dataset.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you see that difference between two values as 365 then perhaps the values are dates instead of years? Note that attaching a format to variable does NOT change the value stored in that variable. So if you had date values and attached the YEAR format to the variable then only the year number will display, but that does not change what is stored.
Are your date values always for the same day within the year? Like January first? If not then what do you want the difference in years to be when the day of the year is not the same between the two dates being compared? The INTCK() function will count the number of boundaries crossed.
But perhaps you just want to take the difference between the year number, independent of when in the year the date was. Make sure to include you grouping variable in the output dataset, otherwise the data will be difficult to interpret. Why not keep that actual first and last day as variables also.
proc sql;
create table work.test_date_diff as
select group
, min(year) as first_date format=date9.
, max(year) as last_date format=date9.
, (year(max(year)) - year(min(year))) as year_range format=4.
from work.test_year
group by group
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I've noticed that, if I calculate the difference in years, I get the difference in days (I think). For example, if the difference is 2002 - 2001, then I get 365 as a result. What I would like is to get the difference in years, so 2002 - 2001 = 1.
You could divide by 365 and take the integer value. Or use the INTCK function or the YRDIF function to account for leap years.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As others have noted, you have a variable named YEAR, but it does not contain year information. It contains specific date information. So you could divide the difference by 365 (or possibly by 365.25) to convert number of days to number of years. But if you want an integer as the output for number of years, you will need to make decisions about some cases, such as how many years do these cases represent:
Min: December 30, 2018
Max: January 2, 2019
Min: December 30, 2018
Max: December 29, 2019
Min: December 30, 2018
Max: March 1, 2019
Min: December 30, 2018
Max: November 1, 2019
Min: December 30, 2018
Max: December 31, 2019
You will need the rules spelled out, or else any and all program suggestions are just guesses.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
Thank you for all of the quick and helpful replies!
It is indeed as @novinosrin and @ed_sas_member noticed and mentioned. My "dates" aren't really dates, but are numeric values and the dif(date) option is working wonders! I made it too complicated for myself by trying to make it into SAS dates and using that (I feel like hitting myself now...).
Many thanks to all of you for helping and clarifying this! The other options also helped with a different year question that I was also looking at!