6-ways of removing duplicate

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 81
Accepted Solution

6-ways of removing duplicate

Hi All,

Please help me to find out 5-6 ways of removing duplicates from a dataset.

Regards

Andy


Accepted Solutions
Solution
‎07-02-2013 02:57 PM
Frequent Contributor
Posts: 113

Re: 6-ways of removing duplicate

Hi Andy,

I may not be much familiar with all ways but trying my best.....

(1)  proc sort data=DSN noduprecs;

      by _all_ ; Run;

(2) Proc sort data=DSN out=sample nodupkey dupout=Duplicate;

     by var1;   Run;

(3) Proc Sql noprint;

     create table unique as select distinct (*) from DSN;

     quit;

(4) proc freq data=DSN noprint;

     tables variable/out=unique (Keep=variable count where=(count=1));

     Run;

(5) Proc sort data=DSN    out=temp;

     by variable ;   Run;

     Data Unique;

     set temp;

     by variable;

     if not first.variable and last.variable;

     run;

View solution in original post


All Replies
Super Contributor
Posts: 276

Re: 6-ways of removing duplicate

Hi..

Normally we use bellow method's to remove duplicate observations from the dataset.

1.Nodup(Proc Sort)

2.nodupkey(Proc Sort)

3.Disticnt(Proc Sql )

4.First. and last . variables(In Datastep)

Regards.

Sanjeev.K

Grand Advisor
Posts: 9,584

Re: 6-ways of removing duplicate

5. proc freq ;

6.proc means;

7.proc summary;

8.HashTable

9. proc tabulate

10.proc report

Super Contributor
Posts: 276

Re: 6-ways of removing duplicate

Hi ..

Yes i do agree with your Additional ways to remove duplicates.

Is there any way to remove duplicates in Proc Sql with NODUP KAY functionality ?? as Distinct Works similar to NODUP.

Regards.

Sanjeev.K

Grand Advisor
Posts: 9,584

Re: 6-ways of removing duplicate

What is that mean ?

select distinct id,name ?

or

select distinct cats(id,name)  ?

proc sql;

select distinct sex,age from sashelp.class;

quit;

Message was edited by: xia keshan

Super Contributor
Posts: 276

Re: 6-ways of removing duplicate

Hi..

I will try to elaborate my question with bellow example .

If Dataset looks like this.

ID      Fee

100    2000

100    3000

101    4000

101    3200

If we use

Proc Sql;

Select Distinct ID,Fee from XX;

quit;

(OR)

Proc sort data=XX nodup;

by ID;

run;

above code will display all the data.because fee is different with in the same ID.

but in the case of

Proc Sort Data=XX nodupkey;

by ID;

run;

above code will generate only 2 rows (100 and 101).

How can we achieve the same by using Proc SQL??

Regards.

Sanjeev.K

Grand Advisor
Posts: 9,584

Re: 6-ways of removing duplicate

select Distinct ID,Fee from XX group by id having fee=max(fee) ;

Super Contributor
Posts: 276

Re: 6-ways of removing duplicate

Hi Ksharp..

Its looks fine only when i have two variables in the dataset.but what in the case of dataset set having 10 or more variables and still i want to remove observation based on one variable in Proc SQL ???

Regards.

Sanjeev.K

Grand Advisor
Posts: 9,584

Re: 6-ways of removing duplicate

Just list it in SELECT clause.

select id,free,xx,xx,.......

Super Contributor
Posts: 276

Re: 6-ways of removing duplicate

You mean

Select Distinct Id,fee,xx,xx,xx,xx,,,,,?????

Will It remove Duplicates based on ID variable? Not sure..

Regards.

Sanjeev.K

Grand Advisor
Posts: 9,584

Re: 6-ways of removing duplicate

Yes. Why not try it in person.

Super Contributor
Posts: 276

Re: 6-ways of removing duplicate

Hi Ksharp..

Let me explain in more detail ...

*****************************METHOD 1************************************


Data Test(keep=Actlevel Fee Height weight) ;

set Sasuser.admit;

run;

Proc Sort Data=test Nodupkey;

by Actlevel;

run;

proc print;

run;

I want to get above  output by using Proc SQL. i used bellow method (as mention by you) but not working properly .

*****************************METHOD 2************************************


Proc Sql;

Select Distinct Actlevel,Fee,Height,weight From SASUSER.ADMIT;

quit;



Regards.

Sanjeev.K

Grand Advisor
Posts: 9,584

Re: 6-ways of removing duplicate

Proc Sql;

Select Distinct Actlevel,Fee,Height,weight From SASUSER.ADMIT

group by Actlevel

having fee=max(fee);

quit;

Ksharp

Super Contributor
Posts: 276

Re: 6-ways of removing duplicate

Tried with your code.

Still i can able to see duplicates in Actlevel Variable.

Query producing 9 observations instead of 3..

Because Every Actlevel Having more then one MAX  fee..:-)

Regards.

Sanjeev.K

Grand Advisor
Posts: 9,584

Re: 6-ways of removing duplicate

Proc Sql;

Select Distinct Actlevel,Fee,Height,weight From SASUSER.ADMIT

group by Actlevel

having fee=max(fee) and max(height);

quit;

Ksharp

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 37 replies
  • 31554 views
  • 18 likes
  • 14 in conversation