Solved
Contributor
Posts: 52

Observation number of duplicate record

Hi,

Is there a way to find out the observation number of duplicate record . I tried with _n_ but somehow am unable to find the exact row number.

Need some help in that

data source;

input a b c d;

cards;

1 2 3 4

5 6 7 8

9 10 11 12

1 2 3 4

9 10 11 12

5 6 7 8

;

run;

I want target as

 a b c d Dup_row_num 1 2 3 4 4 5 6 7 8 6 9 10 11 12 5

I tried following , I know I need to combine these two queries but somehow is unable to find it

proc sql;

create table t1 as

select monotonic() as row_number,a,b,c,d

from source

;quit;

proc sql;

select max(row_number)  from

(

select count(*) as cnt, a,b,c,d, row_number from t1 group by a,b,c,d

)

;

quit;

Accepted Solutions
Solution
‎07-18-2014 12:05 AM
Occasional Contributor
Posts: 10

Re: Observation number of duplicate record

Posted in reply to yashpande

Sorry !

data T1;

set Source;

row_number=_n_;

run;

proc sort data=T1;

by a b c d row_number;

run;

data Result(rename=(row_number=dup_row_number));

set T1;

by a b c d row_number;

if first.d=0 & last.d=1;

run;

All Replies
Occasional Contributor
Posts: 10

Re: Observation number of duplicate record

Posted in reply to yashpande

*Method 1;

proc sort data=Source out=A1;

by a b c d;

run;

data Result(rename=(Cnt=Dup_row_num));

set A1;

by a b c d;

retain Cnt;

if first.d then Cnt=0;

Cnt+1;

if last.d & 0<Cnt then output;

run;

*Method 2;

proc sql;

create table Result(where=(1<Dup_row_num))

as select a, b, c, d, count(*) as Dup_row_num from Source group by a,b,c,d;

quit;

Contributor
Posts: 52

Re: Observation number of duplicate record

No scdent, I want the obswervation number of duplicate record. If you look at my source. First observation is repeated at row number 4 thats what I need to put in variable Dup_row_num

Solution
‎07-18-2014 12:05 AM
Occasional Contributor
Posts: 10

Re: Observation number of duplicate record

Posted in reply to yashpande

Sorry !

data T1;

set Source;

row_number=_n_;

run;

proc sort data=T1;

by a b c d row_number;

run;

data Result(rename=(row_number=dup_row_number));

set T1;

by a b c d row_number;

if first.d=0 & last.d=1;

run;

Super User
Posts: 8,115

Re: Observation number of duplicate record

Posted in reply to yashpande

You need to get the grouping variables in your selected list of variables.

data t1;

row_number+1;

input a b c d;

cards;

1 2 3 4

5 6 7 8

9 10 11 12

1 2 3 4

9 10 11 12

5 6 7 8

;

run;

proc sql;

select a,b,c,d

, min(row_number) as first_row

, max(row_number) as last_dup_row

from t1

group by a,b,c,d

having count(*) > 1

order by a,b,c,d

;

quit;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
• 4 replies
• 236 views
• 0 likes
• 3 in conversation