Hi all,
I have a data set like below. I would like to delete all duplicate course values for an ID.
For e.g. For ID 2, course E,U and W is duplicated. I want to keep only one row for each course. So for ID 2, I would like to have 4 rows showing courses E,J,U and W in each row. It does not matter which duplicate course is deleted.
However If the course is missing, I would like to keep those blank courses. for eg. ID 6 and 16 have missing courses.
Dataset I have:
ID | Status | StatusBeginDate | StatusEndDate | Course | CourseBeginDate | CourseEndDate |
1 | WW | 5/3/2017 | 9/30/2017 | C | 4/19/2017 | 10/3/2017 |
1 | WW | 5/3/2017 | 9/30/2017 | E | 8/16/2017 | 10/3/2017 |
1 | WW | 5/3/2017 | 9/30/2017 | P | 4/19/2017 | 10/3/2017 |
1 | WW | 5/3/2017 | 9/30/2017 | S | 4/19/2017 | 11/7/2017 |
2 | JJ | 11/22/2017 | 2/16/2018 | E | 12/11/2017 | 12/18/2017 |
2 | JJ | 11/22/2017 | 2/16/2018 | E | 12/19/2017 | 2/27/2018 |
2 | JJ | 11/22/2017 | 2/16/2018 | J | 12/11/2017 | 12/15/2017 |
2 | JJ | 11/22/2017 | 2/16/2018 | U | 1/14/2014 | |
2 | JJ | 11/22/2017 | 2/16/2018 | U | 7/17/2017 | |
2 | JJ | 11/22/2017 | 2/16/2018 | W | 11/22/2017 | 12/8/2017 |
2 | JJ | 11/22/2017 | 2/16/2018 | W | 12/11/2017 | 12/15/2017 |
2 | JJ | 11/22/2017 | 2/16/2018 | W | 12/18/2017 | 2/27/2018 |
3 | C1 | 10/27/2017 | 10/30/2017 | B | 10/17/2017 | 10/29/2017 |
3 | C1 | 10/27/2017 | 10/30/2017 | X | 10/30/2017 | 10/30/2017 |
3 | CC | 9/1/2017 | 10/26/2017 | B | 10/17/2017 | 10/29/2017 |
3 | CC | 9/1/2017 | 10/26/2017 | X | 8/14/2017 | 10/16/2017 |
4 | JJ | 12/25/2017 | 3/1/2018 | W | 12/26/2017 | 2/25/2018 |
4 | JJ | 12/25/2017 | 3/1/2018 | X | 12/20/2017 | 1/29/2018 |
4 | C3 | 12/16/2017 | 12/24/2017 | W | 10/23/2017 | 12/20/2017 |
4 | C3 | 12/16/2017 | 12/24/2017 | X | 12/20/2017 | 1/29/2018 |
4 | JJ | 10/29/2017 | 12/15/2017 | W | 10/23/2017 | 12/20/2017 |
5 | JJ | 12/8/2017 | E | 12/28/2017 | 1/25/2018 | |
5 | JJ | 12/8/2017 | W | 12/8/2017 | 12/27/2017 | |
5 | JJ | 12/8/2017 | W | 12/28/2017 | 1/24/2018 | |
5 | CC | 12/1/2017 | 12/7/2017 | A | 10/25/2017 | 12/7/2017 |
6 | JJ | 1/31/2017 | 1/31/2017 | |||
6 | CC | 12/6/2016 | 1/30/2017 | W | 1/17/2017 | 1/17/2017 |
7 | WW | 4/5/2017 | 7/31/2017 | L | 5/19/2017 | 7/31/2017 |
7 | WW | 4/5/2017 | 7/31/2017 | M | 5/19/2017 | 7/31/2017 |
7 | WW | 4/5/2017 | 7/31/2017 | U | 5/28/2014 | |
7 | WW | 4/5/2017 | 7/31/2017 | U | 11/26/2014 | |
8 | FF | 9/18/2017 | 4/30/2018 | R | 9/18/2017 | 6/7/2018 |
8 | FF | 9/18/2017 | 4/30/2018 | Z | 9/25/2015 | |
8 | FF | 9/18/2017 | 4/30/2018 | U | 3/12/2014 | |
8 | FF | 9/18/2017 | 4/30/2018 | U | 9/23/2015 | |
8 | FF | 9/18/2017 | 4/30/2018 | D | 9/18/2017 | 6/7/2018 |
9 | FF | 3/20/2017 | 8/31/2017 | R | 1/26/2017 | 8/3/2017 |
9 | FF | 3/20/2017 | 8/31/2017 | D | 1/26/2017 | 8/3/2017 |
9 | FF | 3/20/2017 | 8/31/2017 | R | 8/4/2017 | 9/19/2017 |
9 | FF | 3/20/2017 | 8/31/2017 | D | 8/4/2017 | 9/19/2017 |
10 | JJ | 10/16/2017 | 10/31/2017 | T | 10/19/2017 | 12/7/2017 |
10 | JJ | 10/16/2017 | 10/31/2017 | E | 10/17/2017 | 12/7/2017 |
10 | JJ | 10/16/2017 | 10/31/2017 | R | 7/7/2017 | 10/16/2017 |
10 | JJ | 10/16/2017 | 10/31/2017 | W | 10/18/2017 | 12/7/2017 |
10 | FF | 7/7/2017 | 10/15/2017 | R | 7/7/2017 | 10/16/2017 |
10 | FF | 7/7/2017 | 10/15/2017 | D | 7/7/2017 | 10/11/2017 |
11 | JJ | 6/22/2017 | E | 10/30/2017 | 12/13/2017 | |
11 | JJ | 6/22/2017 | W | 10/30/2017 | 12/12/2017 | |
11 | JJ | 6/22/2017 | W | 12/13/2017 | 2/6/2018 | |
12 | FF | 12/29/2017 | 5/31/2018 | R | 6/29/2017 | 12/29/2017 |
12 | FF | 12/29/2017 | 5/31/2018 | U | 7/26/2016 | |
12 | FF | 12/29/2017 | 5/31/2018 | Q | 7/26/2016 | |
12 | FF | 12/29/2017 | 5/31/2018 | W | 7/6/2017 | 12/29/2017 |
12 | FF | 12/29/2017 | 5/31/2018 | X | 8/4/2016 | 2/6/2018 |
12 | C1 | 9/11/2017 | 12/28/2017 | R | 6/29/2017 | 12/29/2017 |
12 | C1 | 9/11/2017 | 12/28/2017 | U | 7/26/2016 | |
12 | C1 | 9/11/2017 | 12/28/2017 | Q | 7/26/2016 | |
12 | C1 | 9/11/2017 | 12/28/2017 | W | 7/6/2017 | 12/29/2017 |
12 | C1 | 9/11/2017 | 12/28/2017 | X | 8/4/2016 | 2/6/2018 |
13 | WW | 6/28/2016 | M | 6/28/2016 | ||
13 | WW | 6/28/2016 | S | 6/28/2016 | ||
13 | WW | 6/28/2016 | U | 7/6/2015 | ||
13 | WW | 6/28/2016 | U | 11/23/2015 | ||
13 | WW | 6/28/2016 | U | 6/20/2016 | ||
14 | FF | 12/13/2017 | 3/23/2018 | R | 12/4/2017 | 3/23/2018 |
14 | FF | 12/13/2017 | 3/23/2018 | D | 12/4/2017 | 3/23/2018 |
14 | CC | 11/3/2017 | 12/12/2017 | R | 12/4/2017 | 3/23/2018 |
14 | CC | 11/3/2017 | 12/12/2017 | A | 11/3/2017 | 12/4/2017 |
14 | CC | 11/3/2017 | 12/12/2017 | D | 12/4/2017 | 3/23/2018 |
15 | JJ | 3/28/2017 | 10/31/2017 | E | 8/1/2017 | 11/1/2017 |
16 | WW | 12/22/2017 | 4/30/2018 | N | 12/22/2017 | 1/22/2018 |
16 | WW | 12/22/2017 | 4/30/2018 | O | 12/22/2017 | 1/22/2018 |
16 | CC | 11/3/2017 | 12/21/2017 | |||
16 | JJ | 10/15/2017 | 11/2/2017 |
Dataset I Want:
ID | Status | StatusBeginDate | StatusEndDate | Course | CourseBeginDate | CourseEndDate |
1 | WW | 5/3/2017 | 9/30/2017 | C | 4/19/2017 | 10/3/2017 |
1 | WW | 5/3/2017 | 9/30/2017 | E | 8/16/2017 | 10/3/2017 |
1 | WW | 5/3/2017 | 9/30/2017 | P | 4/19/2017 | 10/3/2017 |
1 | WW | 5/3/2017 | 9/30/2017 | S | 4/19/2017 | 11/7/2017 |
2 | JJ | 11/22/2017 | 2/16/2018 | E | 12/19/2017 | 2/27/2018 |
2 | JJ | 11/22/2017 | 2/16/2018 | J | 12/11/2017 | 12/15/2017 |
2 | JJ | 11/22/2017 | 2/16/2018 | U | 7/17/2017 | |
2 | JJ | 11/22/2017 | 2/16/2018 | W | 12/18/2017 | 2/27/2018 |
3 | C1 | 10/27/2017 | 10/30/2017 | B | 10/17/2017 | 10/29/2017 |
3 | C1 | 10/27/2017 | 10/30/2017 | X | 10/30/2017 | 10/30/2017 |
4 | JJ | 12/25/2017 | 3/1/2018 | W | 12/26/2017 | 2/25/2018 |
4 | JJ | 12/25/2017 | 3/1/2018 | X | 12/20/2017 | 1/29/2018 |
5 | CC | 12/1/2017 | 12/7/2017 | A | 10/25/2017 | 12/7/2017 |
5 | JJ | 12/8/2017 | E | 12/28/2017 | 1/25/2018 | |
5 | JJ | 12/8/2017 | W | 12/28/2017 | 1/24/2018 | |
6 | JJ | 1/31/2017 | 1/31/2017 | |||
6 | CC | 12/6/2016 | 1/30/2017 | W | 1/17/2017 | 1/17/2017 |
7 | WW | 4/5/2017 | 7/31/2017 | L | 5/19/2017 | 7/31/2017 |
7 | WW | 4/5/2017 | 7/31/2017 | M | 5/19/2017 | 7/31/2017 |
7 | WW | 4/5/2017 | 7/31/2017 | U | 11/26/2014 | |
8 | FF | 9/18/2017 | 4/30/2018 | D | 9/18/2017 | 6/7/2018 |
8 | FF | 9/18/2017 | 4/30/2018 | R | 9/18/2017 | 6/7/2018 |
8 | FF | 9/18/2017 | 4/30/2018 | U | 9/23/2015 | |
8 | FF | 9/18/2017 | 4/30/2018 | Z | 9/25/2015 | |
9 | FF | 3/20/2017 | 8/31/2017 | D | 8/4/2017 | 9/19/2017 |
9 | FF | 3/20/2017 | 8/31/2017 | R | 8/4/2017 | 9/19/2017 |
10 | FF | 7/7/2017 | 10/15/2017 | D | 7/7/2017 | 10/11/2017 |
10 | JJ | 10/16/2017 | 10/31/2017 | E | 10/17/2017 | 12/7/2017 |
10 | JJ | 10/16/2017 | 10/31/2017 | R | 7/7/2017 | 10/16/2017 |
10 | JJ | 10/16/2017 | 10/31/2017 | T | 10/19/2017 | 12/7/2017 |
10 | JJ | 10/16/2017 | 10/31/2017 | W | 10/18/2017 | 12/7/2017 |
11 | JJ | 6/22/2017 | E | 10/30/2017 | 12/13/2017 | |
11 | JJ | 6/22/2017 | W | 12/13/2017 | 2/6/2018 | |
12 | FF | 12/29/2017 | 5/31/2018 | Q | 7/26/2016 | |
12 | FF | 12/29/2017 | 5/31/2018 | R | 6/29/2017 | 12/29/2017 |
12 | FF | 12/29/2017 | 5/31/2018 | U | 7/26/2016 | |
12 | FF | 12/29/2017 | 5/31/2018 | W | 7/6/2017 | 12/29/2017 |
12 | FF | 12/29/2017 | 5/31/2018 | X | 8/4/2016 | 2/6/2018 |
13 | WW | 6/28/2016 | M | 6/28/2016 | ||
13 | WW | 6/28/2016 | S | 6/28/2016 | ||
13 | WW | 6/28/2016 | U | 6/20/2016 | ||
14 | CC | 11/3/2017 | 12/12/2017 | A | 11/3/2017 | 12/4/2017 |
14 | FF | 12/13/2017 | 3/23/2018 | D | 12/4/2017 | 3/23/2018 |
14 | FF | 12/13/2017 | 3/23/2018 | R | 12/4/2017 | 3/23/2018 |
15 | JJ | 3/28/2017 | 10/31/2017 | E | 8/1/2017 | 11/1/2017 |
16 | WW | 12/22/2017 | 4/30/2018 | N | 12/22/2017 | 1/22/2018 |
16 | WW | 12/22/2017 | 4/30/2018 | O | 12/22/2017 | 1/22/2018 |
16 | CC | 11/3/2017 | 12/21/2017 | |||
16 | JJ | 10/15/2017 | 11/2/2017 |
Using a double sort works.
proc sort data=have;
by id course descending courseEndDate; *sorts by latest course first;
run;
proc sort data=have out=want nodupkey; *takes the first record for each ID/course;
by id course;
run;
If you want all missing this won't work though, you would need to replace the second step by the step below instead.
data want;
set have;
by id course;
if first.course or missing(course) then output;
run;
Thank you so much Reeza. This worked.
If
then this use of a hash object will allow you to pass through the data only once:
data want;
set have;
if _n_=1 then do;
declare hash h ();
h.definekey('id','course');
h.definedata('id','course');
h.definedone();
end;
if missing(course) or h.add()=0;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.