BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SarahW13
Obsidian | Level 7

I have a question about how to remove duplicate rows based on the value of a certain variable. I'm using the example below to explain what I am effectively trying to do.

 

In this dataset, each id has many duplicate values for the variable med. In some of those rows, there is also a value for the variable flag. In those cases, I want to preserve only the rows where there is a value for flag. However, there are other rows for the id and med value where there is NO value for the variable flag (eg, where id = 1 and Med = B); in those cases, I want to just keep all of those rows intact.

 

id  med  flag   seq

1    A       1       1

1    A       .        2

1    A       .        3

1    B       .        1

1    B       .        2

2    A       1       1

2    A       .        2

2    C       .        1

2    C       0       2

2    C       .        3

2    C       .        4

2    D       1       1

 

 

This is the dataset that I want to have as the end result:

id  med  flag   seq

1    A       1       1

1    B       .        1

1    B       .        2

2    A       1       1

2    C       0       2

2    D       1       1

 

Does anyone have any advice?

1 ACCEPTED SOLUTION

Accepted Solutions
amatsu
Obsidian | Level 7
data have;
input id med$ flag seq;
cards;
1 A 1 1
1 A . 2
1 A . 3
1 B . 1
1 B . 2
2 A 1 1
2 A . 2
2 C . 1
2 C 0 2
2 C . 3
2 C . 4
2 D 1 1
;

proc sort data=have out=flag (keep=id med) nodupkey;
  by id med;
  where flag^=.;
run;

data want1;
  merge have flag(in=_in);
  by id med;
  if (_in=1 and flag^=.) or _in=0 then output;
run;


or


data want2;
  set have;

  if _n_=1 then do;
     dcl hash hs( dataset:"have(where=(flag^=.))" );
     hs.definekey("id","med");
     hs.definedone();
  end;

  rc = hs.check();
  if rc =0 and flag^=. then output;
  if rc^=0 then output;
run;

View solution in original post

4 REPLIES 4
amatsu
Obsidian | Level 7
data have;
input id med$ flag seq;
cards;
1 A 1 1
1 A . 2
1 A . 3
1 B . 1
1 B . 2
2 A 1 1
2 A . 2
2 C . 1
2 C 0 2
2 C . 3
2 C . 4
2 D 1 1
;

proc sort data=have out=flag (keep=id med) nodupkey;
  by id med;
  where flag^=.;
run;

data want1;
  merge have flag(in=_in);
  by id med;
  if (_in=1 and flag^=.) or _in=0 then output;
run;


or


data want2;
  set have;

  if _n_=1 then do;
     dcl hash hs( dataset:"have(where=(flag^=.))" );
     hs.definekey("id","med");
     hs.definedone();
  end;

  rc = hs.check();
  if rc =0 and flag^=. then output;
  if rc^=0 then output;
run;

SarahW13
Obsidian | Level 7

Thank you so much, amatsu! This worked! 🙂

Ksharp
Super User
data have;
input id med$ flag seq;
cards;
1 A 1 1
1 A . 2
1 A . 3
1 B . 1
1 B . 2
2 A 1 1
2 A . 2
2 C . 1
2 C 0 2
2 C . 3
2 C . 4
2 D 1 1
;
data want;
 do until(last.med);
   set have;
   by id med notsorted;
   if not missing(flag) then yes=1;
 end;
 do until(last.med);
   set have;
   by id med notsorted;
   if yes then do;if not missing(flag) then output;end;
    else output;
 end;
 drop yes;
 run;
novinosrin
Tourmaline | Level 20
data have;
input id med$ flag seq;
cards;
1 A 1 1
1 A . 2
1 A . 3
1 B . 1
1 B . 2
2 A 1 1
2 A . 2
2 C . 1
2 C 0 2
2 C . 3
2 C . 4
2 D 1 1
;

proc sql;
create table want as
select *
from have
group by id,med
having flag=max(flag)
order by id,med,seq;
quit;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2296 views
  • 1 like
  • 4 in conversation