BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sam369
Obsidian | Level 7

Hi all ,

 

I am trying to flag the duplicate records over the group. same id fall under different groups. i need to flg only those records who fall under different groups

 

have:

data have;

  input (id grp pam) (: $8.) seq val ord 8.;

  cards;

 100  xyz pop  1 10 1.1

 100  xyz pop  2 11 1.2

 100  xyz pop  3 12 1.3

 100  xyz pop  4 15 1.4

 100  xyz pop  5 11 1.5

 100  xyz pop  6 13 1.6

 100  xyz pop  7 30 1.7

 100  xyz pop  8 40 1.8

 100  xyz pop  9 10 1.9

 100  xyz pop  10 30 1.10

 100  abc pop  9 10 1.9

 100  abc pop  10 30 1.10

 100  abc pop  9 10 1.11

 100  abc pop  9 10 1.11

 100  abc pop  12 30 1.12

 100  abc pop  13 40 1.13

 100  abc pop  14 20 1.14

 100  abc pop  15 10 1.15

 100  abc pop  16 40 1.16

 100  abc pop  17 10 1.17

 ;

 run;

 

want:

 

id     grp    pam    seq    val     ord     flg

 

100    xyz    pop      1     10    1.10

100    xyz    pop      2     11    1.20

100    xyz    pop      3     12    1.30

100    xyz    pop      4     15    1.40

100    xyz    pop      5     11    1.50

100    xyz    pop      6     13    1.60

100    xyz    pop      7     30    1.70

100    xyz    pop      8     40    1.80

100    xyz    pop      9     10    1.90       1

100    xyz    pop     10     30    1.10       1

100    abc    pop      9     10    1.90       1

100    abc    pop     10     30    1.10       1

100    abc    pop      9     10    1.11

100    abc    pop      9     10    1.11

100    abc    pop     12     30    1.12

100    abc    pop     13     40    1.13

100    abc    pop     14     20    1.14

100    abc    pop     15     10    1.15

100    abc    pop     16     40    1.16

100    abc    pop     17     10    1.17

 

my code:

 

proc sql;
  create table want as
  select a.*,b.flg
  from have as a left join(select id,pam,seq,val,ord,1 as flg from have group by id,pam,seq,val,ord having count(*)>1) as b
  on (a.id=b.id & a.pam=b.pam & a.seq=b.seq & a.val=b.val & a.ord=b.ord);
quit;

 

my code is flagging the duplicate record, if id fall in single group.

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

It becomes more challenging if you allow that there might be multiple records with exactly the same values that need to be flagged (because they match another record on all but the group).  Here's one approach:

 

proc sort data=have;

by id pam val seq ord grp;

run;

 

data want;

do until (last.grp);

   set have;

   by id pam val seq ord grp;

   if first.grp=1 and first.ord=0 then flag=1;

   if last.grp=1 and last.ord=0 then flag=1;

end;

do until (last.grp);

   set have;

   by id pam val seq ord grp;

   output;

end;

run; 

 

it's untested code, but it looks like it should do the job.

View solution in original post

5 REPLIES 5
Astounding
PROC Star

It becomes more challenging if you allow that there might be multiple records with exactly the same values that need to be flagged (because they match another record on all but the group).  Here's one approach:

 

proc sort data=have;

by id pam val seq ord grp;

run;

 

data want;

do until (last.grp);

   set have;

   by id pam val seq ord grp;

   if first.grp=1 and first.ord=0 then flag=1;

   if last.grp=1 and last.ord=0 then flag=1;

end;

do until (last.grp);

   set have;

   by id pam val seq ord grp;

   output;

end;

run; 

 

it's untested code, but it looks like it should do the job.

sam369
Obsidian | Level 7

Thank you Astounding!!! Worked perfect!!!.

 

Thanks to hentipdp,PGStat!!! your apporach also works !!!!

 

Thanks , yes my sample data for ord varaible is liite bit odd.. my real data is different !!! i just made some sample. from next time i will keep your points in my mind 

 

 

Thanks!!!!

Hentiedp
Fluorite | Level 6

Hi, 

 

You can perhaps give the following a go.

 

I split the code to make reading and debugging easier. i think the same could have been accomplished with the datastep.


proc sql;
/*Get the duplicate counts*/
create table t1 as
select id,pam,seq,val,ord,1 as flg
from have
group by id,pam,seq,val,ord
having count(*)>1 ;

/*From with in the duplicates get the mmaximum value, excluding the minimum values
this should allow for other values to also be excluded*/
create table t2 as
select a.id,a.pam,a.seq,a.val, max(a.ord) as maxord
from t1 a left outer join (select id,pam,seq,val, min(ord) as minord from t1) b on
(a.id=b.id & a.pam=b.pam & a.seq=b.seq & a.val=b.val & a.ord ne b.minord)
group by a.id,a.pam,a.seq,a.val
;

/*Using the new data flag the OBS that are duplicates*/
create table want2 as
select a.*,b.flg
from have as a left join(select id,pam,seq,val,maxord, 1 as flg
from t2) as b
on (a.id=b.id & a.pam=b.pam & a.seq=b.seq & a.val=b.val & a.ord=b.maxord );
quit;

 

Hope this can at least provide some guidance.

 

KR

H

PGStats
Opal | Level 21

You only need to count the number of distinct values of grp within each instance of the other variables:

 


proc sql;
create table want as
select *, count(distinct grp) > 1 as flg
from have
group by id, pam, seq, val, ord;
select * from want;
quit;
PG
FreelanceReinh
Jade | Level 19

@sam369: Just a side note: Numeric variable ORD is a bit odd. Please note that 1.1=1.10. Depending on how these values were generated, there's even a risk of numeric representation issues, i.e. numbers which look as if they were equal (and should be equal), but in fact have small differences so that GROUP BY would assign them to different groups. Maybe a character variable would be more appropriate, if it isn't redundant anyway.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 3419 views
  • 1 like
  • 5 in conversation