BookmarkSubscribeRSS Feed
d0816
Quartz | Level 8

Hi all,

I have a data set like below. I would like to delete all duplicate course values for an ID.

 

For e.g. For ID 2, course E,U and W is duplicated. I want to keep only one row for each course. So for ID 2, I would like to have 4 rows showing courses E,J,U and W in each row. It does not matter which duplicate course is deleted.

 

However If the course is missing, I would like to keep those blank courses.  for eg. ID 6 and 16 have missing courses.

 

Dataset I have:

IDStatusStatusBeginDateStatusEndDateCourseCourseBeginDateCourseEndDate
1WW5/3/20179/30/2017C4/19/201710/3/2017
1WW5/3/20179/30/2017E8/16/201710/3/2017
1WW5/3/20179/30/2017P4/19/201710/3/2017
1WW5/3/20179/30/2017S4/19/201711/7/2017
2JJ11/22/20172/16/2018E12/11/201712/18/2017
2JJ11/22/20172/16/2018E12/19/20172/27/2018
2JJ11/22/20172/16/2018J12/11/201712/15/2017
2JJ11/22/20172/16/2018U1/14/2014 
2JJ11/22/20172/16/2018U7/17/2017 
2JJ11/22/20172/16/2018W11/22/201712/8/2017
2JJ11/22/20172/16/2018W12/11/201712/15/2017
2JJ11/22/20172/16/2018W12/18/20172/27/2018
3C110/27/201710/30/2017B10/17/201710/29/2017
3C110/27/201710/30/2017X10/30/201710/30/2017
3CC9/1/201710/26/2017B10/17/201710/29/2017
3CC9/1/201710/26/2017X8/14/201710/16/2017
4JJ12/25/20173/1/2018W12/26/20172/25/2018
4JJ12/25/20173/1/2018X12/20/20171/29/2018
4C312/16/201712/24/2017W10/23/201712/20/2017
4C312/16/201712/24/2017X12/20/20171/29/2018
4JJ10/29/201712/15/2017W10/23/201712/20/2017
5JJ12/8/2017 E12/28/20171/25/2018
5JJ12/8/2017 W12/8/201712/27/2017
5JJ12/8/2017 W12/28/20171/24/2018
5CC12/1/201712/7/2017A10/25/201712/7/2017
6JJ1/31/20171/31/2017   
6CC12/6/20161/30/2017W1/17/20171/17/2017
7WW4/5/20177/31/2017L5/19/20177/31/2017
7WW4/5/20177/31/2017M5/19/20177/31/2017
7WW4/5/20177/31/2017U5/28/2014 
7WW4/5/20177/31/2017U11/26/2014 
8FF9/18/20174/30/2018R9/18/20176/7/2018
8FF9/18/20174/30/2018Z9/25/2015 
8FF9/18/20174/30/2018U3/12/2014 
8FF9/18/20174/30/2018U9/23/2015 
8FF9/18/20174/30/2018D9/18/20176/7/2018
9FF3/20/20178/31/2017R1/26/20178/3/2017
9FF3/20/20178/31/2017D1/26/20178/3/2017
9FF3/20/20178/31/2017R8/4/20179/19/2017
9FF3/20/20178/31/2017D8/4/20179/19/2017
10JJ10/16/201710/31/2017T10/19/201712/7/2017
10JJ10/16/201710/31/2017E10/17/201712/7/2017
10JJ10/16/201710/31/2017R7/7/201710/16/2017
10JJ10/16/201710/31/2017W10/18/201712/7/2017
10FF7/7/201710/15/2017R7/7/201710/16/2017
10FF7/7/201710/15/2017D7/7/201710/11/2017
11JJ6/22/2017 E10/30/201712/13/2017
11JJ6/22/2017 W10/30/201712/12/2017
11JJ6/22/2017 W12/13/20172/6/2018
12FF12/29/20175/31/2018R6/29/201712/29/2017
12FF12/29/20175/31/2018U7/26/2016 
12FF12/29/20175/31/2018Q7/26/2016 
12FF12/29/20175/31/2018W7/6/201712/29/2017
12FF12/29/20175/31/2018X8/4/20162/6/2018
12C19/11/201712/28/2017R6/29/201712/29/2017
12C19/11/201712/28/2017U7/26/2016 
12C19/11/201712/28/2017Q7/26/2016 
12C19/11/201712/28/2017W7/6/201712/29/2017
12C19/11/201712/28/2017X8/4/20162/6/2018
13WW6/28/2016 M6/28/2016 
13WW6/28/2016 S6/28/2016 
13WW6/28/2016 U7/6/2015 
13WW6/28/2016 U11/23/2015 
13WW6/28/2016 U6/20/2016 
14FF12/13/20173/23/2018R12/4/20173/23/2018
14FF12/13/20173/23/2018D12/4/20173/23/2018
14CC11/3/201712/12/2017R12/4/20173/23/2018
14CC11/3/201712/12/2017A11/3/201712/4/2017
14CC11/3/201712/12/2017D12/4/20173/23/2018
15JJ3/28/201710/31/2017E8/1/201711/1/2017
16WW12/22/20174/30/2018N12/22/20171/22/2018
16WW12/22/20174/30/2018O12/22/20171/22/2018
16CC11/3/201712/21/2017   
16JJ10/15/201711/2/2017   

 

 

 

Dataset I Want:

IDStatusStatusBeginDateStatusEndDateCourseCourseBeginDateCourseEndDate
1WW5/3/20179/30/2017C4/19/201710/3/2017
1WW5/3/20179/30/2017E8/16/201710/3/2017
1WW5/3/20179/30/2017P4/19/201710/3/2017
1WW5/3/20179/30/2017S4/19/201711/7/2017
2JJ11/22/20172/16/2018E12/19/20172/27/2018
2JJ11/22/20172/16/2018J12/11/201712/15/2017
2JJ11/22/20172/16/2018U7/17/2017 
2JJ11/22/20172/16/2018W12/18/20172/27/2018
3C110/27/201710/30/2017B10/17/201710/29/2017
3C110/27/201710/30/2017X10/30/201710/30/2017
4JJ12/25/20173/1/2018W12/26/20172/25/2018
4JJ12/25/20173/1/2018X12/20/20171/29/2018
5CC12/1/201712/7/2017A10/25/201712/7/2017
5JJ12/8/2017 E12/28/20171/25/2018
5JJ12/8/2017 W12/28/20171/24/2018
6JJ1/31/20171/31/2017   
6CC12/6/20161/30/2017W1/17/20171/17/2017
7WW4/5/20177/31/2017L5/19/20177/31/2017
7WW4/5/20177/31/2017M5/19/20177/31/2017
7WW4/5/20177/31/2017U11/26/2014 
8FF9/18/20174/30/2018D9/18/20176/7/2018
8FF9/18/20174/30/2018R9/18/20176/7/2018
8FF9/18/20174/30/2018U9/23/2015 
8FF9/18/20174/30/2018Z9/25/2015 
9FF3/20/20178/31/2017D8/4/20179/19/2017
9FF3/20/20178/31/2017R8/4/20179/19/2017
10FF7/7/201710/15/2017D7/7/201710/11/2017
10JJ10/16/201710/31/2017E10/17/201712/7/2017
10JJ10/16/201710/31/2017R7/7/201710/16/2017
10JJ10/16/201710/31/2017T10/19/201712/7/2017
10JJ10/16/201710/31/2017W10/18/201712/7/2017
11JJ6/22/2017 E10/30/201712/13/2017
11JJ6/22/2017 W12/13/20172/6/2018
12FF12/29/20175/31/2018Q7/26/2016 
12FF12/29/20175/31/2018R6/29/201712/29/2017
12FF12/29/20175/31/2018U7/26/2016 
12FF12/29/20175/31/2018W7/6/201712/29/2017
12FF12/29/20175/31/2018X8/4/20162/6/2018
13WW6/28/2016 M6/28/2016 
13WW6/28/2016 S6/28/2016 
13WW6/28/2016 U6/20/2016 
14CC11/3/201712/12/2017A11/3/201712/4/2017
14FF12/13/20173/23/2018D12/4/20173/23/2018
14FF12/13/20173/23/2018R12/4/20173/23/2018
15JJ3/28/201710/31/2017E8/1/201711/1/2017
16WW12/22/20174/30/2018N12/22/20171/22/2018
16WW12/22/20174/30/2018O12/22/20171/22/2018
16CC11/3/201712/21/2017   
16JJ10/15/201711/2/2017   
3 REPLIES 3
Reeza
Super User

Using a double sort works.

 

proc sort data=have;
by id course descending courseEndDate; *sorts by latest course first;
run;

proc sort data=have out=want nodupkey; *takes the first record for each ID/course;
by id course;
run;

If you want all missing this won't work though, you would need to replace the second step by the step below instead.

 

data want;
set have;
by id course;

if first.course or missing(course) then output;

run;
d0816
Quartz | Level 8

Thank you so much Reeza. This worked.

mkeintz
PROC Star

If

  1. Sorting the data by course within id is expensive, and
  2. You will take the first instance course encountered for an id

then this use of a hash object will allow you to pass through the data only once:

 

data want;
  set have;
  if _n_=1 then do;
    declare hash h ();
      h.definekey('id','course');
      h.definedata('id','course');
      h.definedone();
  end;
  if missing(course) or h.add()=0;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 2090 views
  • 1 like
  • 3 in conversation