SAS Programming

AnandSahu · Posted 07-02-2013 05:10 AM

Hi All,

Please help me to find out 5-6 ways of removing duplicates from a dataset.

Regards

Andy

umashankersaini · Posted 07-02-2013 02:57 PM

Hi Andy,

I may not be much familiar with all ways but trying my best.....

(1) proc sort data=DSN noduprecs;

by _all_ ; Run;

(2) Proc sort data=DSN out=sample nodupkey dupout=Duplicate;

by var1; Run;

(3) Proc Sql noprint;

create table unique as select distinct (*) from DSN;

quit;

(4) proc freq data=DSN noprint;

tables variable/out=unique (Keep=variable count where=(count=1));

Run;

(5) Proc sort data=DSN out=temp;

by variable ; Run;

Data Unique;

set temp;

by variable;

if not first.variable and last.variable;

run;

View solution in original post

kuridisanjeev · Posted 07-02-2013 06:31 AM

Hi..

Normally we use bellow method's to remove duplicate observations from the dataset.

1.Nodup(Proc Sort)

2.nodupkey(Proc Sort)

3.Disticnt(Proc Sql )

4.First. and last . variables(In Datastep)

Regards.

Sanjeev.K

Ksharp · Posted 07-02-2013 06:36 AM

5. proc freq ;

6.proc means;

7.proc summary;

8.HashTable

9. proc tabulate

10.proc report

kuridisanjeev · Posted 07-02-2013 06:54 AM

Hi ..

Yes i do agree with your Additional ways to remove duplicates.

Is there any way to remove duplicates in Proc Sql with NODUP KAY functionality ?? as Distinct Works similar to NODUP.

Regards.

Sanjeev.K

Ksharp · Posted 07-02-2013 07:20 AM

What is that mean ?

select distinct id,name ?

or

select distinct cats(id,name) ?

proc sql;

select distinct sex,age from sashelp.class;

quit;

Message was edited by: xia keshan

kuridisanjeev · Posted 07-02-2013 07:43 AM

Hi..

I will try to elaborate my question with bellow example .

If Dataset looks like this.

ID Fee

100 2000

100 3000

101 4000

101 3200

If we use

Proc Sql;

Select Distinct ID,Fee from XX;

quit;

(OR)

Proc sort data=XX nodup;

by ID;

run;

above code will display all the data.because fee is different with in the same ID.

but in the case of

Proc Sort Data=XX nodupkey;

by ID;

run;

above code will generate only 2 rows (100 and 101).

How can we achieve the same by using Proc SQL??

Regards.

Sanjeev.K

Ksharp · Posted 07-02-2013 07:56 AM

select Distinct ID,Fee from XX group by id having fee=max(fee) ;

kuridisanjeev · Posted 07-03-2013 05:53 AM

Hi Ksharp..

Its looks fine only when i have two variables in the dataset.but what in the case of dataset set having 10 or more variables and still i want to remove observation based on one variable in Proc SQL ???

Regards.

Sanjeev.K

Ksharp · Posted 07-03-2013 06:26 AM

Just list it in SELECT clause.

select id,free,xx,xx,.......

kuridisanjeev · Posted 07-03-2013 06:31 AM

You mean

Select Distinct Id,fee,xx,xx,xx,xx,,,,,?????

Will It remove Duplicates based on ID variable? Not sure..

Regards.

Sanjeev.K

Ksharp · Posted 07-03-2013 06:48 AM

Yes. Why not try it in person.

kuridisanjeev · Posted 07-03-2013 07:03 AM

Hi Ksharp..

Let me explain in more detail ...

*****************************METHOD 1************************************

Data Test(keep=Actlevel Fee Height weight) ;

set Sasuser.admit;

run;

Proc Sort Data=test Nodupkey;

by Actlevel;

run;

proc print;

run;

I want to get above output by using Proc SQL. i used bellow method (as mention by you) but not working properly .

*****************************METHOD 2************************************

Proc Sql;

Select Distinct Actlevel,Fee,Height,weight From SASUSER.ADMIT;

quit;

Regards.

Sanjeev.K

Ksharp · Posted 07-03-2013 07:10 AM

Proc Sql;

Select Distinct Actlevel,Fee,Height,weight From SASUSER.ADMIT

group by Actlevel

having fee=max(fee);

quit;

Ksharp

kuridisanjeev · Posted 07-03-2013 07:25 AM

Tried with your code.

Still i can able to see duplicates in Actlevel Variable.

Query producing 9 observations instead of 3..

Because Every Actlevel Having more then one MAX fee..:-)

Regards.

Sanjeev.K

Ksharp · Posted 07-03-2013 09:47 AM

Proc Sql;

Select Distinct Actlevel,Fee,Height,weight From SASUSER.ADMIT

group by Actlevel

having fee=max(fee) and max(height);

quit;

Ksharp

SAS Programming

6-ways of removing duplicate

Re: 6-ways of removing duplicate

Re: 6-ways of removing duplicate

Re: 6-ways of removing duplicate

Re: 6-ways of removing duplicate

Re: 6-ways of removing duplicate

Re: 6-ways of removing duplicate

Re: 6-ways of removing duplicate

Re: 6-ways of removing duplicate

Re: 6-ways of removing duplicate

Re: 6-ways of removing duplicate

Re: 6-ways of removing duplicate

Re: 6-ways of removing duplicate

Re: 6-ways of removing duplicate

Re: 6-ways of removing duplicate

Re: 6-ways of removing duplicate

Follow Us

What is...

SAS Programming

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...