BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AnandSahu
Calcite | Level 5

Hi All,

Please help me to find out 5-6 ways of removing duplicates from a dataset.

Regards

Andy

1 ACCEPTED SOLUTION

Accepted Solutions
umashankersaini
Quartz | Level 8

Hi Andy,

I may not be much familiar with all ways but trying my best.....

(1)  proc sort data=DSN noduprecs;

      by _all_ ; Run;

(2) Proc sort data=DSN out=sample nodupkey dupout=Duplicate;

     by var1;   Run;

(3) Proc Sql noprint;

     create table unique as select distinct (*) from DSN;

     quit;

(4) proc freq data=DSN noprint;

     tables variable/out=unique (Keep=variable count where=(count=1));

     Run;

(5) Proc sort data=DSN    out=temp;

     by variable ;   Run;

     Data Unique;

     set temp;

     by variable;

     if not first.variable and last.variable;

     run;

View solution in original post

37 REPLIES 37
kuridisanjeev
Quartz | Level 8

Hi..

Normally we use bellow method's to remove duplicate observations from the dataset.

1.Nodup(Proc Sort)

2.nodupkey(Proc Sort)

3.Disticnt(Proc Sql )

4.First. and last . variables(In Datastep)

Regards.

Sanjeev.K

Ksharp
Super User

5. proc freq ;

6.proc means;

7.proc summary;

8.HashTable

9. proc tabulate

10.proc report

kuridisanjeev
Quartz | Level 8

Hi ..

Yes i do agree with your Additional ways to remove duplicates.

Is there any way to remove duplicates in Proc Sql with NODUP KAY functionality ?? as Distinct Works similar to NODUP.

Regards.

Sanjeev.K

Ksharp
Super User

What is that mean ?

select distinct id,name ?

or

select distinct cats(id,name)  ?

proc sql;

select distinct sex,age from sashelp.class;

quit;

Message was edited by: xia keshan

kuridisanjeev
Quartz | Level 8

Hi..

I will try to elaborate my question with bellow example .

If Dataset looks like this.

ID      Fee

100    2000

100    3000

101    4000

101    3200

If we use

Proc Sql;

Select Distinct ID,Fee from XX;

quit;

(OR)

Proc sort data=XX nodup;

by ID;

run;

above code will display all the data.because fee is different with in the same ID.

but in the case of

Proc Sort Data=XX nodupkey;

by ID;

run;

above code will generate only 2 rows (100 and 101).

How can we achieve the same by using Proc SQL??

Regards.

Sanjeev.K

Ksharp
Super User

select Distinct ID,Fee from XX group by id having fee=max(fee) ;

kuridisanjeev
Quartz | Level 8

Hi Ksharp..

Its looks fine only when i have two variables in the dataset.but what in the case of dataset set having 10 or more variables and still i want to remove observation based on one variable in Proc SQL ???

Regards.

Sanjeev.K

Ksharp
Super User

Just list it in SELECT clause.

select id,free,xx,xx,.......

kuridisanjeev
Quartz | Level 8

You mean

Select Distinct Id,fee,xx,xx,xx,xx,,,,,?????

Will It remove Duplicates based on ID variable? Not sure..

Regards.

Sanjeev.K

Ksharp
Super User

Yes. Why not try it in person.

kuridisanjeev
Quartz | Level 8

Hi Ksharp..

Let me explain in more detail ...

*****************************METHOD 1************************************


Data Test(keep=Actlevel Fee Height weight) ;

set Sasuser.admit;

run;

Proc Sort Data=test Nodupkey;

by Actlevel;

run;

proc print;

run;

I want to get above  output by using Proc SQL. i used bellow method (as mention by you) but not working properly .

*****************************METHOD 2************************************


Proc Sql;

Select Distinct Actlevel,Fee,Height,weight From SASUSER.ADMIT;

quit;



Regards.

Sanjeev.K

Ksharp
Super User

Proc Sql;

Select Distinct Actlevel,Fee,Height,weight From SASUSER.ADMIT

group by Actlevel

having fee=max(fee);

quit;

Ksharp

kuridisanjeev
Quartz | Level 8

Tried with your code.

Still i can able to see duplicates in Actlevel Variable.

Query producing 9 observations instead of 3..

Because Every Actlevel Having more then one MAX  fee..:-)

Regards.

Sanjeev.K

Ksharp
Super User

Proc Sql;

Select Distinct Actlevel,Fee,Height,weight From SASUSER.ADMIT

group by Actlevel

having fee=max(fee) and max(height);

quit;

Ksharp

sas-innovate-2024.png

Today is the last day to save with the early bird rate! Register today for just $695 - $100 off the standard rate.

 

Plus, pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 37 replies
  • 146934 views
  • 23 likes
  • 14 in conversation