Accepted Solution

6-ways of removing duplicate

Reply
Super Contributor
Posts: 276

Re: 6-ways of removing duplicate

Hi Ksharp.

Ahhh...Still can't able to get desired output.tried with your updated code ,yet producing 9 observations instead of 3.

Here am attaching ADMIT dataset for your ref.

Regards.

Sanjeev.K

Attachment
Super User
Posts: 10,520

Re: 6-ways of removing duplicate

Posted in reply to kuridisanjeev

OK.

Proc Sql;

Select Distinct Actlevel,Fee,Height,weight From x.ADMIT

group by Actlevel

having  height=max(height);

quit;

Respected Advisor
Posts: 3,162

Re: 6-ways of removing duplicate

Posted in reply to kuridisanjeev

I don't think you do can it in one SQL pass without using this undocumented monotonic(), as this is the only unique index that SQL can use to product the nodup results, all of the other efforts of generating certain unique feature based on your data will NOT be as robust:

Proc Sql;

create table want (drop=no) as

Select Distinct Actlevel,Fee,Height,weight , monotonic() as no From SASUSER.ADMIT

group by Actlevel

having no=max(no);

quit;

Haikuo

Ksharp: that is being said, what if you have multiple max height? And isn't the bedtime for you? lol.

Super User
Posts: 10,520

Re: 6-ways of removing duplicate

HaiKuo.

Sure. It is time to go to sleep.

OP,

or try this one :

Proc Sql;

Select Distinct Actlevel,Fee,Height,weight From x.ADMIT

group by Actlevel

having  cats(fee,height,weight)=max(cats(fee,height,weight));

quit;

Super Contributor
Posts: 276

Re: 6-ways of removing duplicate

Hi,

You made my day with simple solution.thanks a lot.....!!Seems it better to use NODUPKEY in Proc Sort instead of trying this.

Hi

.

Sorry for testing your patients :-) and finally you came with the your mark answer....!!!!

Your Solution also working perfectly...

And Sorry for both of you ,because i can not able to make your answer as Correct Answer as i am not the person who created this Discussion..:-) :smileysilly:

Thanks &Regards.

Sanjeev.K

Learner
Posts: 1

Re: 6-ways of removing duplicate

Hi Everyone!

 

I had one qsn regarding first.and last.  Can we separate duplicates of 2 variables with first. and last. .We can remove duplicates for 1 varialble but not sure if we can separate duplicates for 2 variables. For eg:

In this program:

data a;
input name$ age;
datalines;
am 2
am 3
am 3
am 3
an 3
an 7
ka 7
ka 7
ka 8
sa 2
;
run;

 

Can we separate all the unique records for eg sa 2, an 7,ka 7, ka 8,am 3,am 2 an 3.

 

Please help

Thanks! 

Regular Learner
Posts: 1

Re: 6-ways of removing duplicate

Posted in reply to kuridisanjeev

Hello 

I have a question for you if you can help if it will nice.
Here my code in data steps how I can write this code with Proc SQL to get the same thing. I will appreciate it if you could answer. Thank you
I would want any summary records. I would like the duplicate removed by the LAST.clm_ln_src_id
I tried everything with nested proc SQL or Having, max(clm_ln_src_id). none of them worked.
Thank you
 
Warmest regard,
Jean-Luc Fanny
 
 
PROC SORT DATA= Horizon_ClaimsHealth;
BY LEGACY_ORIGINAL_CLAIM_ID clm_ln_src_id VOIDSORT SORTCARD ;
RUN;
DATA Horizon_ClaimsHealth_v2 ;
SET  Horizon_ClaimsHealth;
BY LEGACY_ORIGINAL_CLAIM_ID clm_ln_src_id VOIDSORT SORTCARD ;
IF LAST.clm_ln_src_id THEN OUTPUT;
RUN;
Solution
‎07-02-2013 02:57 PM
Frequent Contributor
Posts: 120

Re: 6-ways of removing duplicate

Posted in reply to AnandSahu

Hi Andy,

I may not be much familiar with all ways but trying my best.....

(1)  proc sort data=DSN noduprecs;

      by _all_ ; Run;

(2) Proc sort data=DSN out=sample nodupkey dupout=Duplicate;

     by var1;   Run;

(3) Proc Sql noprint;

     create table unique as select distinct (*) from DSN;

     quit;

(4) proc freq data=DSN noprint;

     tables variable/out=unique (Keep=variable count where=(count=1));

     Run;

(5) Proc sort data=DSN    out=temp;

     by variable ;   Run;

     Data Unique;

     set temp;

     by variable;

     if not first.variable and last.variable;

     run;

Frequent Contributor
Posts: 81

Re: 6-ways of removing duplicate

Posted in reply to umashankersaini

thanx uma...

Is there any way to remove duplicateS with the help of PROC APPEND and MERG as well?????

Contributor
Posts: 29

Re: 6-ways of removing duplicate

Posted in reply to AnandSahu

Anand, to use merge & append, you need to sort the observation & while sorting Nodup key take cares of dup records!!!


Super User
Posts: 10,520

Re: 6-ways of removing duplicate

Posted in reply to umashankersaini

(5) is wrong.

Esteemed Advisor
Posts: 5,336

Re: 6-ways of removing duplicate

Should be:

(5) Proc sort data=DSN    out=temp;

     by variable ;   Run;

     Data Unique;

     set temp;

     by variable;

     if first.variable;

     run;

PG

PG
Super User
Posts: 10,520

Re: 6-ways of removing duplicate

Absolutely

Frequent Contributor
Posts: 139

Re: 6-ways of removing duplicate

Hi Ksharp,

I would like to know different ways of identifying  duplicates or in other words subsetting just the duplicates into a new dataset, perhaps I should have posted a new question however I thought I would add here as I noticed you and PG Sir got involved in the thread, got me excited, also others can learn too.

Cheers from England,

Charlotte

P.S I have a feeling, key= option/unique in set statement can perhaps be used for remving duplicates on a self merge?

Super Contributor
Posts: 276

Re: 6-ways of removing duplicate

Posted in reply to CharlotteCain

HI CharlotteCain

You want to create a new dataset with all duplicates observations ???

If yes, here is the way..

      Proc sort data=DSN out=sample nodupkey dupout=Duplicate;

     by var1;

      Run;

A new dataset duplicate will be created and it contains all duplicate observations from DSN dataset.

You can achive the same by using Data step as well.

like

  Proc sort data=DSN  ;

     by var1;

     Run;

Data Unique Duplicates;

set DSN;

by Var1;

If First.Var1 then Output Unique;

else output Duplicates;

run;

Hope this helps.

Regards.

Sanjeev.K

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 37 replies
  • 54300 views
  • 20 likes
  • 14 in conversation