BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SarahW13
Obsidian | Level 7

I have a question about how to remove duplicate rows based on the value of a certain variable. I'm using the example below to explain what I am effectively trying to do.

 

In this dataset, each id has many duplicate values for the variable med. In some of those rows, there is also a value for the variable flag. In those cases, I want to preserve only the rows where there is a value for flag. However, there are other rows for the id and med value where there is NO value for the variable flag (eg, where id = 1 and Med = B); in those cases, I want to just keep all of those rows intact.

 

id  med  flag   seq

1    A       1       1

1    A       .        2

1    A       .        3

1    B       .        1

1    B       .        2

2    A       1       1

2    A       .        2

2    C       .        1

2    C       0       2

2    C       .        3

2    C       .        4

2    D       1       1

 

 

This is the dataset that I want to have as the end result:

id  med  flag   seq

1    A       1       1

1    B       .        1

1    B       .        2

2    A       1       1

2    C       0       2

2    D       1       1

 

Does anyone have any advice?

1 ACCEPTED SOLUTION

Accepted Solutions
amatsu
Obsidian | Level 7
data have;
input id med$ flag seq;
cards;
1 A 1 1
1 A . 2
1 A . 3
1 B . 1
1 B . 2
2 A 1 1
2 A . 2
2 C . 1
2 C 0 2
2 C . 3
2 C . 4
2 D 1 1
;

proc sort data=have out=flag (keep=id med) nodupkey;
  by id med;
  where flag^=.;
run;

data want1;
  merge have flag(in=_in);
  by id med;
  if (_in=1 and flag^=.) or _in=0 then output;
run;


or


data want2;
  set have;

  if _n_=1 then do;
     dcl hash hs( dataset:"have(where=(flag^=.))" );
     hs.definekey("id","med");
     hs.definedone();
  end;

  rc = hs.check();
  if rc =0 and flag^=. then output;
  if rc^=0 then output;
run;

View solution in original post

4 REPLIES 4
amatsu
Obsidian | Level 7
data have;
input id med$ flag seq;
cards;
1 A 1 1
1 A . 2
1 A . 3
1 B . 1
1 B . 2
2 A 1 1
2 A . 2
2 C . 1
2 C 0 2
2 C . 3
2 C . 4
2 D 1 1
;

proc sort data=have out=flag (keep=id med) nodupkey;
  by id med;
  where flag^=.;
run;

data want1;
  merge have flag(in=_in);
  by id med;
  if (_in=1 and flag^=.) or _in=0 then output;
run;


or


data want2;
  set have;

  if _n_=1 then do;
     dcl hash hs( dataset:"have(where=(flag^=.))" );
     hs.definekey("id","med");
     hs.definedone();
  end;

  rc = hs.check();
  if rc =0 and flag^=. then output;
  if rc^=0 then output;
run;

SarahW13
Obsidian | Level 7

Thank you so much, amatsu! This worked! 🙂

Ksharp
Super User
data have;
input id med$ flag seq;
cards;
1 A 1 1
1 A . 2
1 A . 3
1 B . 1
1 B . 2
2 A 1 1
2 A . 2
2 C . 1
2 C 0 2
2 C . 3
2 C . 4
2 D 1 1
;
data want;
 do until(last.med);
   set have;
   by id med notsorted;
   if not missing(flag) then yes=1;
 end;
 do until(last.med);
   set have;
   by id med notsorted;
   if yes then do;if not missing(flag) then output;end;
    else output;
 end;
 drop yes;
 run;
novinosrin
Tourmaline | Level 20
data have;
input id med$ flag seq;
cards;
1 A 1 1
1 A . 2
1 A . 3
1 B . 1
1 B . 2
2 A 1 1
2 A . 2
2 C . 1
2 C 0 2
2 C . 3
2 C . 4
2 D 1 1
;

proc sql;
create table want as
select *
from have
group by id,med
having flag=max(flag)
order by id,med,seq;
quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1521 views
  • 1 like
  • 4 in conversation