Solved: Re: 6-ways of removing duplicate - Page 2

kuridisanjeev · Posted 07-03-2013 09:55 AM

Hi Ksharp.

Ahhh...Still can't able to get desired output.tried with your updated code ,yet producing 9 observations instead of 3.

Here am attaching ADMIT dataset for your ref.

Regards.

Sanjeev.K

Ksharp · Posted 07-03-2013 10:31 AM

OK.

Proc Sql;

Select Distinct Actlevel,Fee,Height,weight From x.ADMIT

group by Actlevel

having height=max(height);

quit;

Haikuo · Posted 07-03-2013 10:32 AM

I don't think you do can it in one SQL pass without using this undocumented monotonic(), as this is the only unique index that SQL can use to product the nodup results, all of the other efforts of generating certain unique feature based on your data will NOT be as robust:

Proc Sql;

create table want (drop=no) as

Select Distinct Actlevel,Fee,Height,weight , monotonic() as no From SASUSER.ADMIT

group by Actlevel

having no=max(no);

quit;

Haikuo

Ksharp: that is being said, what if you have multiple max height? And isn't the bedtime for you? lol.

Ksharp · Posted 07-03-2013 10:49 AM

HaiKuo.

Sure. It is time to go to sleep.

OP,

or try this one :

Proc Sql;

Select Distinct Actlevel,Fee,Height,weight From x.ADMIT

group by Actlevel

having cats(fee,height,weight)=max(cats(fee,height,weight));

quit;

kuridisanjeev · Posted 07-03-2013 11:17 AM

Hi,

You made my day with simple solution.thanks a lot.....!!Seems it better to use NODUPKEY in Proc Sort instead of trying this.

Hi

.

Sorry for testing your patients 🙂 and finally you came with the your mark answer....!!!!

Your Solution also working perfectly...

And Sorry for both of you ,because i can not able to make your answer as Correct Answer as i am not the person who created this Discussion..:-) :smileysilly:

Thanks &Regards.

Sanjeev.K

Samar1 · Posted 04-17-2017 02:52 PM

Hi Everyone!

I had one qsn regarding first.and last. Can we separate duplicates of 2 variables with first. and last. .We can remove duplicates for 1 varialble but not sure if we can separate duplicates for 2 variables. For eg:

In this program:

data a;
input name$ age;
datalines;
am 2
am 3
am 3
am 3
an 3
an 7
ka 7
ka 7
ka 8
sa 2
;
run;

Can we separate all the unique records for eg sa 2, an 7,ka 7, ka 8,am 3,am 2 an 3.

Please help

Thanks!

jfanny · Posted 10-13-2016 12:40 PM

Hello

I have a question for you if you can help if it will nice.

Here my code in data steps how I can write this code with Proc SQL to get the same thing. I will appreciate it if you could answer. Thank you

I would want any summary records. I would like the duplicate removed by the LAST.clm_ln_src_id

I tried everything with nested proc SQL or Having, max(clm_ln_src_id). none of them worked.

Thank you

Warmest regard,

Jean-Luc Fanny

PROC SORT DATA= Horizon_ClaimsHealth;

BY LEGACY_ORIGINAL_CLAIM_ID clm_ln_src_id VOIDSORT SORTCARD ;

RUN;

DATA Horizon_ClaimsHealth_v2 ;

SET Horizon_ClaimsHealth;

BY LEGACY_ORIGINAL_CLAIM_ID clm_ln_src_id VOIDSORT SORTCARD ;

IF LAST.clm_ln_src_id THEN OUTPUT;

RUN;

umashankersaini · Posted 07-02-2013 02:57 PM

Hi Andy,

I may not be much familiar with all ways but trying my best.....

(1) proc sort data=DSN noduprecs;

by _all_ ; Run;

(2) Proc sort data=DSN out=sample nodupkey dupout=Duplicate;

by var1; Run;

(3) Proc Sql noprint;

create table unique as select distinct (*) from DSN;

quit;

(4) proc freq data=DSN noprint;

tables variable/out=unique (Keep=variable count where=(count=1));

Run;

(5) Proc sort data=DSN out=temp;

by variable ; Run;

Data Unique;

set temp;

by variable;

if not first.variable and last.variable;

run;

AnandSahu · Posted 07-02-2013 02:59 PM

thanx uma...

Is there any way to remove duplicateS with the help of PROC APPEND and MERG as well?????

sunilzood · Posted 07-02-2013 10:24 PM

Anand, to use merge & append, you need to sort the observation & while sorting Nodup key take cares of dup records!!!

Ksharp · Posted 07-02-2013 10:57 PM

(5) is wrong.

PGStats · Posted 07-02-2013 11:20 PM

Should be:

(5) Proc sort data=DSN out=temp;

by variable ; Run;

Data Unique;

set temp;

by variable;

if first.variable;

run;

PG

Ksharp · Posted 07-03-2013 12:35 AM

Absolutely

CharlotteCain · Posted 07-03-2013 05:23 AM

Hi Ksharp,

I would like to know different ways of identifying duplicates or in other words subsetting just the duplicates into a new dataset, perhaps I should have posted a new question however I thought I would add here as I noticed you and PG Sir got involved in the thread, got me excited, also others can learn too.

Cheers from England,

Charlotte

P.S I have a feeling, key= option/unique in set statement can perhaps be used for remving duplicates on a self merge?

kuridisanjeev · Posted 07-03-2013 05:40 AM

HI CharlotteCain

You want to create a new dataset with all duplicates observations ???

If yes, here is the way..

Proc sort data=DSN out=sample nodupkey dupout=Duplicate;

by var1;

Run;

A new dataset duplicate will be created and it contains all duplicate observations from DSN dataset.

You can achive the same by using Data step as well.

like

Proc sort data=DSN ;

by var1;

Run;

Data Unique Duplicates;

set DSN;

by Var1;

If First.Var1 then Output Unique;

else output Duplicates;

run;

Hope this helps.

Regards.

Sanjeev.K

Classroom Training Available!