Deleting duplicate values of a variable for an ID in multiple rows

d0816 · Posted 08-22-2018 10:32 AM

Hi all,

I have a data set like below. I would like to delete all duplicate course values for an ID.

For e.g. For ID 2, course E,U and W is duplicated. I want to keep only one row for each course. So for ID 2, I would like to have 4 rows showing courses E,J,U and W in each row. It does not matter which duplicate course is deleted.

However If the course is missing, I would like to keep those blank courses. for eg. ID 6 and 16 have missing courses.

Dataset I have:

ID	Status	StatusBeginDate	StatusEndDate	Course	CourseBeginDate	CourseEndDate
1	WW	5/3/2017	9/30/2017	C	4/19/2017	10/3/2017
1	WW	5/3/2017	9/30/2017	E	8/16/2017	10/3/2017
1	WW	5/3/2017	9/30/2017	P	4/19/2017	10/3/2017
1	WW	5/3/2017	9/30/2017	S	4/19/2017	11/7/2017
2	JJ	11/22/2017	2/16/2018	E	12/11/2017	12/18/2017
2	JJ	11/22/2017	2/16/2018	E	12/19/2017	2/27/2018
2	JJ	11/22/2017	2/16/2018	J	12/11/2017	12/15/2017
2	JJ	11/22/2017	2/16/2018	U	1/14/2014
2	JJ	11/22/2017	2/16/2018	U	7/17/2017
2	JJ	11/22/2017	2/16/2018	W	11/22/2017	12/8/2017
2	JJ	11/22/2017	2/16/2018	W	12/11/2017	12/15/2017
2	JJ	11/22/2017	2/16/2018	W	12/18/2017	2/27/2018
3	C1	10/27/2017	10/30/2017	B	10/17/2017	10/29/2017
3	C1	10/27/2017	10/30/2017	X	10/30/2017	10/30/2017
3	CC	9/1/2017	10/26/2017	B	10/17/2017	10/29/2017
3	CC	9/1/2017	10/26/2017	X	8/14/2017	10/16/2017
4	JJ	12/25/2017	3/1/2018	W	12/26/2017	2/25/2018
4	JJ	12/25/2017	3/1/2018	X	12/20/2017	1/29/2018
4	C3	12/16/2017	12/24/2017	W	10/23/2017	12/20/2017
4	C3	12/16/2017	12/24/2017	X	12/20/2017	1/29/2018
4	JJ	10/29/2017	12/15/2017	W	10/23/2017	12/20/2017
5	JJ	12/8/2017		E	12/28/2017	1/25/2018
5	JJ	12/8/2017		W	12/8/2017	12/27/2017
5	JJ	12/8/2017		W	12/28/2017	1/24/2018
5	CC	12/1/2017	12/7/2017	A	10/25/2017	12/7/2017
6	JJ	1/31/2017	1/31/2017
6	CC	12/6/2016	1/30/2017	W	1/17/2017	1/17/2017
7	WW	4/5/2017	7/31/2017	L	5/19/2017	7/31/2017
7	WW	4/5/2017	7/31/2017	M	5/19/2017	7/31/2017
7	WW	4/5/2017	7/31/2017	U	5/28/2014
7	WW	4/5/2017	7/31/2017	U	11/26/2014
8	FF	9/18/2017	4/30/2018	R	9/18/2017	6/7/2018
8	FF	9/18/2017	4/30/2018	Z	9/25/2015
8	FF	9/18/2017	4/30/2018	U	3/12/2014
8	FF	9/18/2017	4/30/2018	U	9/23/2015
8	FF	9/18/2017	4/30/2018	D	9/18/2017	6/7/2018
9	FF	3/20/2017	8/31/2017	R	1/26/2017	8/3/2017
9	FF	3/20/2017	8/31/2017	D	1/26/2017	8/3/2017
9	FF	3/20/2017	8/31/2017	R	8/4/2017	9/19/2017
9	FF	3/20/2017	8/31/2017	D	8/4/2017	9/19/2017
10	JJ	10/16/2017	10/31/2017	T	10/19/2017	12/7/2017
10	JJ	10/16/2017	10/31/2017	E	10/17/2017	12/7/2017
10	JJ	10/16/2017	10/31/2017	R	7/7/2017	10/16/2017
10	JJ	10/16/2017	10/31/2017	W	10/18/2017	12/7/2017
10	FF	7/7/2017	10/15/2017	R	7/7/2017	10/16/2017
10	FF	7/7/2017	10/15/2017	D	7/7/2017	10/11/2017
11	JJ	6/22/2017		E	10/30/2017	12/13/2017
11	JJ	6/22/2017		W	10/30/2017	12/12/2017
11	JJ	6/22/2017		W	12/13/2017	2/6/2018
12	FF	12/29/2017	5/31/2018	R	6/29/2017	12/29/2017
12	FF	12/29/2017	5/31/2018	U	7/26/2016
12	FF	12/29/2017	5/31/2018	Q	7/26/2016
12	FF	12/29/2017	5/31/2018	W	7/6/2017	12/29/2017
12	FF	12/29/2017	5/31/2018	X	8/4/2016	2/6/2018
12	C1	9/11/2017	12/28/2017	R	6/29/2017	12/29/2017
12	C1	9/11/2017	12/28/2017	U	7/26/2016
12	C1	9/11/2017	12/28/2017	Q	7/26/2016
12	C1	9/11/2017	12/28/2017	W	7/6/2017	12/29/2017
12	C1	9/11/2017	12/28/2017	X	8/4/2016	2/6/2018
13	WW	6/28/2016		M	6/28/2016
13	WW	6/28/2016		S	6/28/2016
13	WW	6/28/2016		U	7/6/2015
13	WW	6/28/2016		U	11/23/2015
13	WW	6/28/2016		U	6/20/2016
14	FF	12/13/2017	3/23/2018	R	12/4/2017	3/23/2018
14	FF	12/13/2017	3/23/2018	D	12/4/2017	3/23/2018
14	CC	11/3/2017	12/12/2017	R	12/4/2017	3/23/2018
14	CC	11/3/2017	12/12/2017	A	11/3/2017	12/4/2017
14	CC	11/3/2017	12/12/2017	D	12/4/2017	3/23/2018
15	JJ	3/28/2017	10/31/2017	E	8/1/2017	11/1/2017
16	WW	12/22/2017	4/30/2018	N	12/22/2017	1/22/2018
16	WW	12/22/2017	4/30/2018	O	12/22/2017	1/22/2018
16	CC	11/3/2017	12/21/2017
16	JJ	10/15/2017	11/2/2017

Dataset I Want:

ID	Status	StatusBeginDate	StatusEndDate	Course	CourseBeginDate	CourseEndDate
1	WW	5/3/2017	9/30/2017	C	4/19/2017	10/3/2017
1	WW	5/3/2017	9/30/2017	E	8/16/2017	10/3/2017
1	WW	5/3/2017	9/30/2017	P	4/19/2017	10/3/2017
1	WW	5/3/2017	9/30/2017	S	4/19/2017	11/7/2017
2	JJ	11/22/2017	2/16/2018	E	12/19/2017	2/27/2018
2	JJ	11/22/2017	2/16/2018	J	12/11/2017	12/15/2017
2	JJ	11/22/2017	2/16/2018	U	7/17/2017
2	JJ	11/22/2017	2/16/2018	W	12/18/2017	2/27/2018
3	C1	10/27/2017	10/30/2017	B	10/17/2017	10/29/2017
3	C1	10/27/2017	10/30/2017	X	10/30/2017	10/30/2017
4	JJ	12/25/2017	3/1/2018	W	12/26/2017	2/25/2018
4	JJ	12/25/2017	3/1/2018	X	12/20/2017	1/29/2018
5	CC	12/1/2017	12/7/2017	A	10/25/2017	12/7/2017
5	JJ	12/8/2017		E	12/28/2017	1/25/2018
5	JJ	12/8/2017		W	12/28/2017	1/24/2018
6	JJ	1/31/2017	1/31/2017
6	CC	12/6/2016	1/30/2017	W	1/17/2017	1/17/2017
7	WW	4/5/2017	7/31/2017	L	5/19/2017	7/31/2017
7	WW	4/5/2017	7/31/2017	M	5/19/2017	7/31/2017
7	WW	4/5/2017	7/31/2017	U	11/26/2014
8	FF	9/18/2017	4/30/2018	D	9/18/2017	6/7/2018
8	FF	9/18/2017	4/30/2018	R	9/18/2017	6/7/2018
8	FF	9/18/2017	4/30/2018	U	9/23/2015
8	FF	9/18/2017	4/30/2018	Z	9/25/2015
9	FF	3/20/2017	8/31/2017	D	8/4/2017	9/19/2017
9	FF	3/20/2017	8/31/2017	R	8/4/2017	9/19/2017
10	FF	7/7/2017	10/15/2017	D	7/7/2017	10/11/2017
10	JJ	10/16/2017	10/31/2017	E	10/17/2017	12/7/2017
10	JJ	10/16/2017	10/31/2017	R	7/7/2017	10/16/2017
10	JJ	10/16/2017	10/31/2017	T	10/19/2017	12/7/2017
10	JJ	10/16/2017	10/31/2017	W	10/18/2017	12/7/2017
11	JJ	6/22/2017		E	10/30/2017	12/13/2017
11	JJ	6/22/2017		W	12/13/2017	2/6/2018
12	FF	12/29/2017	5/31/2018	Q	7/26/2016
12	FF	12/29/2017	5/31/2018	R	6/29/2017	12/29/2017
12	FF	12/29/2017	5/31/2018	U	7/26/2016
12	FF	12/29/2017	5/31/2018	W	7/6/2017	12/29/2017
12	FF	12/29/2017	5/31/2018	X	8/4/2016	2/6/2018
13	WW	6/28/2016		M	6/28/2016
13	WW	6/28/2016		S	6/28/2016
13	WW	6/28/2016		U	6/20/2016
14	CC	11/3/2017	12/12/2017	A	11/3/2017	12/4/2017
14	FF	12/13/2017	3/23/2018	D	12/4/2017	3/23/2018
14	FF	12/13/2017	3/23/2018	R	12/4/2017	3/23/2018
15	JJ	3/28/2017	10/31/2017	E	8/1/2017	11/1/2017
16	WW	12/22/2017	4/30/2018	N	12/22/2017	1/22/2018
16	WW	12/22/2017	4/30/2018	O	12/22/2017	1/22/2018
16	CC	11/3/2017	12/21/2017
16	JJ	10/15/2017	11/2/2017

Reeza · Posted 08-22-2018 10:54 AM

Using a double sort works.

proc sort data=have;
by id course descending courseEndDate; *sorts by latest course first;
run;

proc sort data=have out=want nodupkey; *takes the first record for each ID/course;
by id course;
run;

If you want all missing this won't work though, you would need to replace the second step by the step below instead.

data want;
set have;
by id course;

if first.course or missing(course) then output;

run;

d0816 · Posted 08-22-2018 12:14 PM

Thank you so much Reeza. This worked.

mkeintz · Posted 08-24-2018 10:55 PM

If

Sorting the data by course within id is expensive, and
You will take the first instance course encountered for an id

then this use of a hash object will allow you to pass through the data only once:

data want;
  set have;
  if _n_=1 then do;
    declare hash h ();
      h.definekey('id','course');
      h.definedata('id','course');
      h.definedone();
  end;
  if missing(course) or h.add()=0;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Deleting duplicate values of a variable for an ID in multiple rows

Re: Deleting duplicate values of a variable for an ID in multiple rows

Re: Deleting duplicate values of a variable for an ID in multiple rows

Re: Deleting duplicate values of a variable for an ID in multiple rows

Catch up on SAS Innovate 2026

Deleting duplicate values of a variable for an ID in multiple rows

Re: Deleting duplicate values of a variable for an ID in multiple rows

Re: Deleting duplicate values of a variable for an ID in multiple rows

Re: Deleting duplicate values of a variable for an ID in multiple rows

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away