Hi....I am trying to calculate the Minimum Start and Maximum End dates for each ID and Program. Because each course can be taken in different years, the group for the minimum and maximum is the same Year and if the Years are consecutive. Once there is a missing Year for the Program, then the minimum and maximum is calculates as a new group. What is the best way to set this up. Thanks.
data Have; length ID $3 Year $ 4 Program $ 15 Course $ 5 Start $ 10 End $ 10; infile datalines delimiter='' missover dsd; input ID $ Year : $char4. Program : $char15. Course : $char5. Start : $char10. End : $char10.; datalines; 100 2016 AutoMechanics A 2016-11-03 2017-04-20 100 2018 AutoMechanics B 2018-09-07 2019-04-20 100 2018 AutoMechanics C 2018-10-15 2019-06-25 100 2019 AutoMechanics D 2019-10-17 2020-01-20 100 2020 AutoMechanics E 2020-08-27 2020-09-15 ; run; Want: ID Year Program Course Start End Min_Start Max_End 100 2016 AutoMechanics A 2016-11-03 2017-04-20 2016-11-03 2017-04-20 100 2018 AutoMechanics B 2018-09-07 2019-04-20 2018-09-07 2020-09-15 100 2018 AutoMechanics C 2018-10-15 2019-06-25 2018-09-07 2020-09-15 100 2019 AutoMechanics D 2019-10-17 2020-01-20 2018-09-07 2020-09-15 100 2020 AutoMechanics E 2020-08-27 2020-09-15 2018-09-07 2020-09-15
You say "missing year" but do not show any "missing" values for year. Do you mean a year out of sequence?
This would likely be a lot easier if the year and dates were actually numeric and dates an not character values as then you could use something like Proc Summary with the dates after inserting a grouping variable.
This duplicates your desired output.
data Have; length ID $3 Year $ 4 Program $ 15 Course $ 5 Start $ 10 End $ 10; infile datalines delimiter='' missover dsd; input ID $ Year : $char4. Program : $char15. Course : $char5. Start : $char10. End : $char10.; datalines; 100 2016 AutoMechanics A 2016-11-03 2017-04-20 100 2018 AutoMechanics B 2018-09-07 2019-04-20 100 2018 AutoMechanics C 2018-10-15 2019-06-25 100 2019 AutoMechanics D 2019-10-17 2020-01-20 100 2020 AutoMechanics E 2020-08-27 2020-09-15 ; run; data need; set have; by id year; Retain ordergroup; yeardif = dif(year); if first.id then ordergroup=1; if yeardif>1 then ordergroup+1; drop yeardif; run; proc sql; create table want as select a.*,b.min_start,b.max_end from need as a left join (select id,ordergroup,min(start) as min_start, max(end) as max_end from need group by id, ordergroup ) as b on a.id=b.id and a.ordergroup=b.ordergroup ; quit;
Examples with only one set of "groups" such as Id and Program are poor examples as I do not know if ID is unique for Program or not.
Also, bad things come to people who insist on storing dates as character values. None of the SAS functions or formats that work with dates will work with characters meaning that much work may be needed to get actual dates before most types of valid comparisons, intervals or grouping is done.
You say "missing year" but do not show any "missing" values for year. Do you mean a year out of sequence?
This would likely be a lot easier if the year and dates were actually numeric and dates an not character values as then you could use something like Proc Summary with the dates after inserting a grouping variable.
This duplicates your desired output.
data Have; length ID $3 Year $ 4 Program $ 15 Course $ 5 Start $ 10 End $ 10; infile datalines delimiter='' missover dsd; input ID $ Year : $char4. Program : $char15. Course : $char5. Start : $char10. End : $char10.; datalines; 100 2016 AutoMechanics A 2016-11-03 2017-04-20 100 2018 AutoMechanics B 2018-09-07 2019-04-20 100 2018 AutoMechanics C 2018-10-15 2019-06-25 100 2019 AutoMechanics D 2019-10-17 2020-01-20 100 2020 AutoMechanics E 2020-08-27 2020-09-15 ; run; data need; set have; by id year; Retain ordergroup; yeardif = dif(year); if first.id then ordergroup=1; if yeardif>1 then ordergroup+1; drop yeardif; run; proc sql; create table want as select a.*,b.min_start,b.max_end from need as a left join (select id,ordergroup,min(start) as min_start, max(end) as max_end from need group by id, ordergroup ) as b on a.id=b.id and a.ordergroup=b.ordergroup ; quit;
Examples with only one set of "groups" such as Id and Program are poor examples as I do not know if ID is unique for Program or not.
Also, bad things come to people who insist on storing dates as character values. None of the SAS functions or formats that work with dates will work with characters meaning that much work may be needed to get actual dates before most types of valid comparisons, intervals or grouping is done.
Hi BallardW…….Thanks for your help and your solution. It works great. Just to let you know that in the actual dataset the dates are stored in the $ymmdd10. format. When I was creating the dummy dataset, I tried using the :yymmdd10. format on the Input statement and was getting an error message so when I changed the format to $char10. the error message was gone. But once again thank you so much for your help....Greatly Appreciated!!..
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.