DATA Step, Macro, Functions and more

removing duplicates

Accepted Solution Solved
Reply
Contributor
Posts: 29
Accepted Solution

removing duplicates

Hi,

I have a dataset with duplicates that I want to remove.

 

This is the dataset I have

ID      A     B        C

1       1     0         0

1       0     1         0

1       1     1         0

2       0      1         1

2       1      0         1

 

 And I would like to remove duplicates to look like

ID   A      B        C

1      1     1        0

2      1     1        1

 

I tried the code below

proc sort data=have nodupkey out=want;

by id a b c;

run;

 

The above code did not remove duplicates. Any other suggestions?

 

Thanks

 


Accepted Solutions
Solution
‎11-26-2017 01:42 PM
Contributor
Posts: 29

Re: removing duplicates

Thanks. It worked very nicely. But is it possible to do the same with data step? I am not familiar with Proc SQL.

View solution in original post


All Replies
PROC Star
Posts: 499

Re: removing duplicates

something like this

 

data have;
input
ID A B C;
datalines;
1 1 0 0
1 0 1 0
1 1 1 0
2 0 1 1
2 1 0 1
;
run;


proc sql;
create table want as
select ID, max(A) as A, max(B) as B, max(c) as C
from have
group by ID;
quit;

Solution
‎11-26-2017 01:42 PM
Contributor
Posts: 29

Re: removing duplicates

Thanks. It worked very nicely. But is it possible to do the same with data step? I am not familiar with Proc SQL.

Super User
Posts: 9,580

Re: removing duplicates

(Just showing one variable)

After sorting by id (which proc sql will do on its own):

data want;
set have (rename=(a=_a));
by id;
retain a;
if first.id then a = .;
a = max(_a,a);
drop _a;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Trusted Advisor
Posts: 1,270

Re: removing duplicates

Hi,

 

Data step solution:

 

proc stdize data=have outstat=stat(where=(_type_='SCALE')) method=maxabs;
by id;
run;

 

data want(drop=_type_ i);
set stat;
array v(*) a b c;
do i=1 to dim(v);
if v(i) = . then v(i)=0;
end;
run;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 310 views
  • 2 likes
  • 4 in conversation