turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Observation number of duplicate record

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-17-2014 11:08 PM

Hi,

Is there a way to find out the observation number of duplicate record . I tried with _n_ but somehow am unable to find the exact row number.

Need some help in that

data source;

input a b c d;

cards;

1 2 3 4

5 6 7 8

9 10 11 12

1 2 3 4

9 10 11 12

5 6 7 8

;

run;

I want target as

a | b | c | d | Dup_row_num |

1 | 2 | 3 | 4 | 4 |

5 | 6 | 7 | 8 | 6 |

9 | 10 | 11 | 12 | 5 |

I tried following , I know I need to combine these two queries but somehow is unable to find it

proc sql;

create table t1 as

select monotonic() as row_number,a,b,c,d

from source

;quit;

proc sql;

select max(row_number) from

(

select count(*) as cnt, a,b,c,d, row_number from t1 group by a,b,c,d

)

;

quit;

Thanks in advance

Accepted Solutions

Solution

07-18-2014
12:05 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-18-2014 12:05 AM

Sorry !

data T1;

set Source;

row_number=_n_;

run;

proc sort data=T1;

by a b c d row_number;

run;

data Result(rename=(row_number=dup_row_number));

set T1;

by a b c d row_number;

if first.d=0 & last.d=1;

run;

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-17-2014 11:32 PM

*Method 1;

proc sort data=Source out=A1;

by a b c d;

run;

data Result(rename=(Cnt=Dup_row_num));

set A1;

by a b c d;

retain Cnt;

if first.d then Cnt=0;

Cnt+1;

if last.d & 0<Cnt then output;

run;

*Method 2;

proc sql;

create table Result(where=(1<Dup_row_num))

as select a, b, c, d, count(*) as Dup_row_num from Source group by a,b,c,d;

quit;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-17-2014 11:46 PM

No scdent, I want the obswervation number of duplicate record. If you look at my source. First observation is repeated at row number 4 thats what I need to put in variable Dup_row_num

Solution

07-18-2014
12:05 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-18-2014 12:05 AM

Sorry !

data T1;

set Source;

row_number=_n_;

run;

proc sort data=T1;

by a b c d row_number;

run;

data Result(rename=(row_number=dup_row_number));

set T1;

by a b c d row_number;

if first.d=0 & last.d=1;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-18-2014 12:25 AM

You need to get the grouping variables in your selected list of variables.

**data** t1;

row_number+**1**;

input a b c d;

cards;

1 2 3 4

5 6 7 8

9 10 11 12

1 2 3 4

9 10 11 12

5 6 7 8

;

**run**;

**proc** **sql**;

select a,b,c,d

, min(row_number) as first_row

, max(row_number) as last_dup_row

from t1

group by a,b,c,d

having count(*) > **1**

order by a,b,c,d

;

**quit**;