Hi All,
Please help me to find out 5-6 ways of removing duplicates from a dataset.
Regards
Andy
Hi Andy,
I may not be much familiar with all ways but trying my best.....
(1) proc sort data=DSN noduprecs;
by _all_ ; Run;
(2) Proc sort data=DSN out=sample nodupkey dupout=Duplicate;
by var1; Run;
(3) Proc Sql noprint;
create table unique as select distinct (*) from DSN;
quit;
(4) proc freq data=DSN noprint;
tables variable/out=unique (Keep=variable count where=(count=1));
Run;
(5) Proc sort data=DSN out=temp;
by variable ; Run;
Data Unique;
set temp;
by variable;
if not first.variable and last.variable;
run;
Hi..
Normally we use bellow method's to remove duplicate observations from the dataset.
1.Nodup(Proc Sort)
2.nodupkey(Proc Sort)
3.Disticnt(Proc Sql )
4.First. and last . variables(In Datastep)
Regards.
Sanjeev.K
5. proc freq ;
6.proc means;
7.proc summary;
8.HashTable
9. proc tabulate
10.proc report
What is that mean ?
select distinct id,name ?
or
select distinct cats(id,name) ?
proc sql;
select distinct sex,age from sashelp.class;
quit;
Message was edited by: xia keshan
Hi..
I will try to elaborate my question with bellow example .
If Dataset looks like this.
ID Fee
100 2000
100 3000
101 4000
101 3200
If we use
Proc Sql;
Select Distinct ID,Fee from XX;
quit;
(OR)
Proc sort data=XX nodup;
by ID;
run;
above code will display all the data.because fee is different with in the same ID.
but in the case of
Proc Sort Data=XX nodupkey;
by ID;
run;
above code will generate only 2 rows (100 and 101).
How can we achieve the same by using Proc SQL??
Regards.
Sanjeev.K
select Distinct ID,Fee from XX group by id having fee=max(fee) ;
Hi Ksharp..
Its looks fine only when i have two variables in the dataset.but what in the case of dataset set having 10 or more variables and still i want to remove observation based on one variable in Proc SQL ???
Regards.
Sanjeev.K
Just list it in SELECT clause.
select id,free,xx,xx,.......
You mean
Select Distinct Id,fee,xx,xx,xx,xx,,,,,?????
Will It remove Duplicates based on ID variable? Not sure..
Regards.
Sanjeev.K
Yes. Why not try it in person.
Hi Ksharp..
Let me explain in more detail ...
*****************************METHOD 1************************************
Data Test(keep=Actlevel Fee Height weight) ;
set Sasuser.admit;
run;
Proc Sort Data=test Nodupkey;
by Actlevel;
run;
proc print;
run;
I want to get above output by using Proc SQL. i used bellow method (as mention by you) but not working properly .
*****************************METHOD 2************************************
Proc Sql;
Select Distinct Actlevel,Fee,Height,weight From SASUSER.ADMIT;
quit;
Regards.
Sanjeev.K
Proc Sql;
Select Distinct Actlevel,Fee,Height,weight From SASUSER.ADMIT
group by Actlevel
having fee=max(fee);
quit;
Ksharp
Tried with your code.
Still i can able to see duplicates in Actlevel Variable.
Query producing 9 observations instead of 3..
Because Every Actlevel Having more then one MAX fee..:-)
Regards.
Sanjeev.K
Proc Sql;
Select Distinct Actlevel,Fee,Height,weight From SASUSER.ADMIT
group by Actlevel
having fee=max(fee) and max(height);
quit;
Ksharp
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.