🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Pyrite | Level 9

## RE: Calculate Min & Max by groups

Hi....I am trying to calculate the Minimum Start and Maximum End dates for each ID and Program. Because each course can be taken in different years, the group for the minimum and maximum is the same Year and if the Years are consecutive. Once there is a missing Year for the Program, then the minimum and maximum is calculates as a new group. What is the best way to set this up. Thanks.

```data Have;
length ID \$3 Year \$ 4 Program \$ 15 Course \$ 5 Start \$ 10 End \$ 10;
infile datalines delimiter='' missover dsd;
input ID \$ Year : \$char4. Program : \$char15. Course : \$char5. Start : \$char10. End : \$char10.;
datalines;
100 2016 AutoMechanics A 2016-11-03 2017-04-20
100 2018 AutoMechanics B 2018-09-07 2019-04-20
100 2018 AutoMechanics C 2018-10-15 2019-06-25
100 2019 AutoMechanics D 2019-10-17 2020-01-20
100 2020 AutoMechanics E 2020-08-27 2020-09-15
;
run;

Want:

ID  Year Program       Course     Start      End       Min_Start    Max_End
100 2016 AutoMechanics   A     2016-11-03 2017-04-20  2016-11-03  2017-04-20
100 2018 AutoMechanics   B     2018-09-07 2019-04-20  2018-09-07  2020-09-15
100 2018 AutoMechanics   C     2018-10-15 2019-06-25  2018-09-07  2020-09-15
100 2019 AutoMechanics   D     2019-10-17 2020-01-20  2018-09-07  2020-09-15
100 2020 AutoMechanics   E     2020-08-27 2020-09-15  2018-09-07  2020-09-15```
1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## RE: Calculate Min & Max by groups

You say "missing year" but do not show any "missing" values for year. Do you mean a year out of sequence?

This would likely be a lot easier if the year and dates were actually numeric and dates an not character values as then you could use something like Proc Summary with the dates after inserting a grouping variable.

This duplicates your desired output.

```data Have;
length ID \$3 Year \$ 4 Program \$ 15 Course \$ 5 Start \$ 10 End \$ 10;
infile datalines delimiter='' missover dsd;
input ID \$ Year : \$char4. Program : \$char15. Course : \$char5. Start : \$char10. End : \$char10.;
datalines;
100 2016 AutoMechanics A 2016-11-03 2017-04-20
100 2018 AutoMechanics B 2018-09-07 2019-04-20
100 2018 AutoMechanics C 2018-10-15 2019-06-25
100 2019 AutoMechanics D 2019-10-17 2020-01-20
100 2020 AutoMechanics E 2020-08-27 2020-09-15
;
run;

data need;
set have;
by id year;
Retain ordergroup;
yeardif = dif(year);
if first.id then ordergroup=1;
if yeardif>1 then ordergroup+1;
drop yeardif;
run;

proc sql;
create table want as
select a.*,b.min_start,b.max_end

from need as a left join
(select id,ordergroup,min(start) as min_start, max(end) as max_end
from need
group by id, ordergroup
) as b
on a.id=b.id
and a.ordergroup=b.ordergroup
;
quit;
```

Examples with only one set of "groups" such as Id and Program are poor examples as I do not know if ID is unique for Program or not.

Also, bad things come to people who insist on storing dates as character values. None of the SAS functions or formats that work with dates will work with characters meaning that much work may be needed to get actual dates before most types of valid comparisons, intervals or grouping is done.

2 REPLIES 2
Super User

## RE: Calculate Min & Max by groups

You say "missing year" but do not show any "missing" values for year. Do you mean a year out of sequence?

This would likely be a lot easier if the year and dates were actually numeric and dates an not character values as then you could use something like Proc Summary with the dates after inserting a grouping variable.

This duplicates your desired output.

```data Have;
length ID \$3 Year \$ 4 Program \$ 15 Course \$ 5 Start \$ 10 End \$ 10;
infile datalines delimiter='' missover dsd;
input ID \$ Year : \$char4. Program : \$char15. Course : \$char5. Start : \$char10. End : \$char10.;
datalines;
100 2016 AutoMechanics A 2016-11-03 2017-04-20
100 2018 AutoMechanics B 2018-09-07 2019-04-20
100 2018 AutoMechanics C 2018-10-15 2019-06-25
100 2019 AutoMechanics D 2019-10-17 2020-01-20
100 2020 AutoMechanics E 2020-08-27 2020-09-15
;
run;

data need;
set have;
by id year;
Retain ordergroup;
yeardif = dif(year);
if first.id then ordergroup=1;
if yeardif>1 then ordergroup+1;
drop yeardif;
run;

proc sql;
create table want as
select a.*,b.min_start,b.max_end

from need as a left join
(select id,ordergroup,min(start) as min_start, max(end) as max_end
from need
group by id, ordergroup
) as b
on a.id=b.id
and a.ordergroup=b.ordergroup
;
quit;
```

Examples with only one set of "groups" such as Id and Program are poor examples as I do not know if ID is unique for Program or not.

Also, bad things come to people who insist on storing dates as character values. None of the SAS functions or formats that work with dates will work with characters meaning that much work may be needed to get actual dates before most types of valid comparisons, intervals or grouping is done.

Pyrite | Level 9

## RE: Calculate Min & Max by groups

Hi BallardW…….Thanks for your help and your solution. It works great. Just to let you know that in the actual dataset the dates are stored in the \$ymmdd10. format. When I was creating the dummy dataset, I tried using the :yymmdd10. format on the Input statement and was getting an error message so when I changed the format to \$char10. the error message was gone. But once again thank you so much for your help....Greatly Appreciated!!..

Discussion stats
• 2 replies
• 317 views
• 0 likes
• 2 in conversation