Hi all,
I have a data set like below. I would like to delete all duplicate course values for an ID.
For e.g. For ID 2, course E,U and W is duplicated. I want to keep only one row for each course. So for ID 2, I would like to have 4 rows showing courses E,J,U and W in each row. It does not matter which duplicate course is deleted.
However If the course is missing, I would like to keep those blank courses. for eg. ID 6 and 16 have missing courses.
Dataset I have:
ID | Status | StatusBeginDate | StatusEndDate | Course | CourseBeginDate | CourseEndDate |
1 | WW | 5/3/2017 | 9/30/2017 | C | 4/19/2017 | 10/3/2017 |
1 | WW | 5/3/2017 | 9/30/2017 | E | 8/16/2017 | 10/3/2017 |
1 | WW | 5/3/2017 | 9/30/2017 | P | 4/19/2017 | 10/3/2017 |
1 | WW | 5/3/2017 | 9/30/2017 | S | 4/19/2017 | 11/7/2017 |
2 | JJ | 11/22/2017 | 2/16/2018 | E | 12/11/2017 | 12/18/2017 |
2 | JJ | 11/22/2017 | 2/16/2018 | E | 12/19/2017 | 2/27/2018 |
2 | JJ | 11/22/2017 | 2/16/2018 | J | 12/11/2017 | 12/15/2017 |
2 | JJ | 11/22/2017 | 2/16/2018 | U | 1/14/2014 | |
2 | JJ | 11/22/2017 | 2/16/2018 | U | 7/17/2017 | |
2 | JJ | 11/22/2017 | 2/16/2018 | W | 11/22/2017 | 12/8/2017 |
2 | JJ | 11/22/2017 | 2/16/2018 | W | 12/11/2017 | 12/15/2017 |
2 | JJ | 11/22/2017 | 2/16/2018 | W | 12/18/2017 | 2/27/2018 |
3 | C1 | 10/27/2017 | 10/30/2017 | B | 10/17/2017 | 10/29/2017 |
3 | C1 | 10/27/2017 | 10/30/2017 | X | 10/30/2017 | 10/30/2017 |
3 | CC | 9/1/2017 | 10/26/2017 | B | 10/17/2017 | 10/29/2017 |
3 | CC | 9/1/2017 | 10/26/2017 | X | 8/14/2017 | 10/16/2017 |
4 | JJ | 12/25/2017 | 3/1/2018 | W | 12/26/2017 | 2/25/2018 |
4 | JJ | 12/25/2017 | 3/1/2018 | X | 12/20/2017 | 1/29/2018 |
4 | C3 | 12/16/2017 | 12/24/2017 | W | 10/23/2017 | 12/20/2017 |
4 | C3 | 12/16/2017 | 12/24/2017 | X | 12/20/2017 | 1/29/2018 |
4 | JJ | 10/29/2017 | 12/15/2017 | W | 10/23/2017 | 12/20/2017 |
5 | JJ | 12/8/2017 | E | 12/28/2017 | 1/25/2018 | |
5 | JJ | 12/8/2017 | W | 12/8/2017 | 12/27/2017 | |
5 | JJ | 12/8/2017 | W | 12/28/2017 | 1/24/2018 | |
5 | CC | 12/1/2017 | 12/7/2017 | A | 10/25/2017 | 12/7/2017 |
6 | JJ | 1/31/2017 | 1/31/2017 | |||
6 | CC | 12/6/2016 | 1/30/2017 | W | 1/17/2017 | 1/17/2017 |
7 | WW | 4/5/2017 | 7/31/2017 | L | 5/19/2017 | 7/31/2017 |
7 | WW | 4/5/2017 | 7/31/2017 | M | 5/19/2017 | 7/31/2017 |
7 | WW | 4/5/2017 | 7/31/2017 | U | 5/28/2014 | |
7 | WW | 4/5/2017 | 7/31/2017 | U | 11/26/2014 | |
8 | FF | 9/18/2017 | 4/30/2018 | R | 9/18/2017 | 6/7/2018 |
8 | FF | 9/18/2017 | 4/30/2018 | Z | 9/25/2015 | |
8 | FF | 9/18/2017 | 4/30/2018 | U | 3/12/2014 | |
8 | FF | 9/18/2017 | 4/30/2018 | U | 9/23/2015 | |
8 | FF | 9/18/2017 | 4/30/2018 | D | 9/18/2017 | 6/7/2018 |
9 | FF | 3/20/2017 | 8/31/2017 | R | 1/26/2017 | 8/3/2017 |
9 | FF | 3/20/2017 | 8/31/2017 | D | 1/26/2017 | 8/3/2017 |
9 | FF | 3/20/2017 | 8/31/2017 | R | 8/4/2017 | 9/19/2017 |
9 | FF | 3/20/2017 | 8/31/2017 | D | 8/4/2017 | 9/19/2017 |
10 | JJ | 10/16/2017 | 10/31/2017 | T | 10/19/2017 | 12/7/2017 |
10 | JJ | 10/16/2017 | 10/31/2017 | E | 10/17/2017 | 12/7/2017 |
10 | JJ | 10/16/2017 | 10/31/2017 | R | 7/7/2017 | 10/16/2017 |
10 | JJ | 10/16/2017 | 10/31/2017 | W | 10/18/2017 | 12/7/2017 |
10 | FF | 7/7/2017 | 10/15/2017 | R | 7/7/2017 | 10/16/2017 |
10 | FF | 7/7/2017 | 10/15/2017 | D | 7/7/2017 | 10/11/2017 |
11 | JJ | 6/22/2017 | E | 10/30/2017 | 12/13/2017 | |
11 | JJ | 6/22/2017 | W | 10/30/2017 | 12/12/2017 | |
11 | JJ | 6/22/2017 | W | 12/13/2017 | 2/6/2018 | |
12 | FF | 12/29/2017 | 5/31/2018 | R | 6/29/2017 | 12/29/2017 |
12 | FF | 12/29/2017 | 5/31/2018 | U | 7/26/2016 | |
12 | FF | 12/29/2017 | 5/31/2018 | Q | 7/26/2016 | |
12 | FF | 12/29/2017 | 5/31/2018 | W | 7/6/2017 | 12/29/2017 |
12 | FF | 12/29/2017 | 5/31/2018 | X | 8/4/2016 | 2/6/2018 |
12 | C1 | 9/11/2017 | 12/28/2017 | R | 6/29/2017 | 12/29/2017 |
12 | C1 | 9/11/2017 | 12/28/2017 | U | 7/26/2016 | |
12 | C1 | 9/11/2017 | 12/28/2017 | Q | 7/26/2016 | |
12 | C1 | 9/11/2017 | 12/28/2017 | W | 7/6/2017 | 12/29/2017 |
12 | C1 | 9/11/2017 | 12/28/2017 | X | 8/4/2016 | 2/6/2018 |
13 | WW | 6/28/2016 | M | 6/28/2016 | ||
13 | WW | 6/28/2016 | S | 6/28/2016 | ||
13 | WW | 6/28/2016 | U | 7/6/2015 | ||
13 | WW | 6/28/2016 | U | 11/23/2015 | ||
13 | WW | 6/28/2016 | U | 6/20/2016 | ||
14 | FF | 12/13/2017 | 3/23/2018 | R | 12/4/2017 | 3/23/2018 |
14 | FF | 12/13/2017 | 3/23/2018 | D | 12/4/2017 | 3/23/2018 |
14 | CC | 11/3/2017 | 12/12/2017 | R | 12/4/2017 | 3/23/2018 |
14 | CC | 11/3/2017 | 12/12/2017 | A | 11/3/2017 | 12/4/2017 |
14 | CC | 11/3/2017 | 12/12/2017 | D | 12/4/2017 | 3/23/2018 |
15 | JJ | 3/28/2017 | 10/31/2017 | E | 8/1/2017 | 11/1/2017 |
16 | WW | 12/22/2017 | 4/30/2018 | N | 12/22/2017 | 1/22/2018 |
16 | WW | 12/22/2017 | 4/30/2018 | O | 12/22/2017 | 1/22/2018 |
16 | CC | 11/3/2017 | 12/21/2017 | |||
16 | JJ | 10/15/2017 | 11/2/2017 |
Dataset I Want:
ID | Status | StatusBeginDate | StatusEndDate | Course | CourseBeginDate | CourseEndDate |
1 | WW | 5/3/2017 | 9/30/2017 | C | 4/19/2017 | 10/3/2017 |
1 | WW | 5/3/2017 | 9/30/2017 | E | 8/16/2017 | 10/3/2017 |
1 | WW | 5/3/2017 | 9/30/2017 | P | 4/19/2017 | 10/3/2017 |
1 | WW | 5/3/2017 | 9/30/2017 | S | 4/19/2017 | 11/7/2017 |
2 | JJ | 11/22/2017 | 2/16/2018 | E | 12/19/2017 | 2/27/2018 |
2 | JJ | 11/22/2017 | 2/16/2018 | J | 12/11/2017 | 12/15/2017 |
2 | JJ | 11/22/2017 | 2/16/2018 | U | 7/17/2017 | |
2 | JJ | 11/22/2017 | 2/16/2018 | W | 12/18/2017 | 2/27/2018 |
3 | C1 | 10/27/2017 | 10/30/2017 | B | 10/17/2017 | 10/29/2017 |
3 | C1 | 10/27/2017 | 10/30/2017 | X | 10/30/2017 | 10/30/2017 |
4 | JJ | 12/25/2017 | 3/1/2018 | W | 12/26/2017 | 2/25/2018 |
4 | JJ | 12/25/2017 | 3/1/2018 | X | 12/20/2017 | 1/29/2018 |
5 | CC | 12/1/2017 | 12/7/2017 | A | 10/25/2017 | 12/7/2017 |
5 | JJ | 12/8/2017 | E | 12/28/2017 | 1/25/2018 | |
5 | JJ | 12/8/2017 | W | 12/28/2017 | 1/24/2018 | |
6 | JJ | 1/31/2017 | 1/31/2017 | |||
6 | CC | 12/6/2016 | 1/30/2017 | W | 1/17/2017 | 1/17/2017 |
7 | WW | 4/5/2017 | 7/31/2017 | L | 5/19/2017 | 7/31/2017 |
7 | WW | 4/5/2017 | 7/31/2017 | M | 5/19/2017 | 7/31/2017 |
7 | WW | 4/5/2017 | 7/31/2017 | U | 11/26/2014 | |
8 | FF | 9/18/2017 | 4/30/2018 | D | 9/18/2017 | 6/7/2018 |
8 | FF | 9/18/2017 | 4/30/2018 | R | 9/18/2017 | 6/7/2018 |
8 | FF | 9/18/2017 | 4/30/2018 | U | 9/23/2015 | |
8 | FF | 9/18/2017 | 4/30/2018 | Z | 9/25/2015 | |
9 | FF | 3/20/2017 | 8/31/2017 | D | 8/4/2017 | 9/19/2017 |
9 | FF | 3/20/2017 | 8/31/2017 | R | 8/4/2017 | 9/19/2017 |
10 | FF | 7/7/2017 | 10/15/2017 | D | 7/7/2017 | 10/11/2017 |
10 | JJ | 10/16/2017 | 10/31/2017 | E | 10/17/2017 | 12/7/2017 |
10 | JJ | 10/16/2017 | 10/31/2017 | R | 7/7/2017 | 10/16/2017 |
10 | JJ | 10/16/2017 | 10/31/2017 | T | 10/19/2017 | 12/7/2017 |
10 | JJ | 10/16/2017 | 10/31/2017 | W | 10/18/2017 | 12/7/2017 |
11 | JJ | 6/22/2017 | E | 10/30/2017 | 12/13/2017 | |
11 | JJ | 6/22/2017 | W | 12/13/2017 | 2/6/2018 | |
12 | FF | 12/29/2017 | 5/31/2018 | Q | 7/26/2016 | |
12 | FF | 12/29/2017 | 5/31/2018 | R | 6/29/2017 | 12/29/2017 |
12 | FF | 12/29/2017 | 5/31/2018 | U | 7/26/2016 | |
12 | FF | 12/29/2017 | 5/31/2018 | W | 7/6/2017 | 12/29/2017 |
12 | FF | 12/29/2017 | 5/31/2018 | X | 8/4/2016 | 2/6/2018 |
13 | WW | 6/28/2016 | M | 6/28/2016 | ||
13 | WW | 6/28/2016 | S | 6/28/2016 | ||
13 | WW | 6/28/2016 | U | 6/20/2016 | ||
14 | CC | 11/3/2017 | 12/12/2017 | A | 11/3/2017 | 12/4/2017 |
14 | FF | 12/13/2017 | 3/23/2018 | D | 12/4/2017 | 3/23/2018 |
14 | FF | 12/13/2017 | 3/23/2018 | R | 12/4/2017 | 3/23/2018 |
15 | JJ | 3/28/2017 | 10/31/2017 | E | 8/1/2017 | 11/1/2017 |
16 | WW | 12/22/2017 | 4/30/2018 | N | 12/22/2017 | 1/22/2018 |
16 | WW | 12/22/2017 | 4/30/2018 | O | 12/22/2017 | 1/22/2018 |
16 | CC | 11/3/2017 | 12/21/2017 | |||
16 | JJ | 10/15/2017 | 11/2/2017 |
Using a double sort works.
proc sort data=have;
by id course descending courseEndDate; *sorts by latest course first;
run;
proc sort data=have out=want nodupkey; *takes the first record for each ID/course;
by id course;
run;
If you want all missing this won't work though, you would need to replace the second step by the step below instead.
data want;
set have;
by id course;
if first.course or missing(course) then output;
run;
Thank you so much Reeza. This worked.
If
then this use of a hash object will allow you to pass through the data only once:
data want;
set have;
if _n_=1 then do;
declare hash h ();
h.definekey('id','course');
h.definedata('id','course');
h.definedone();
end;
if missing(course) or h.add()=0;
run;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.