BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
y_fu
Fluorite | Level 6

I want to remove the samples that are repeated. For example, if I have data like:

Sample    v1  

A              1

B              1

B              2

C              1

 

I want the output like:

Sample    v1  

A              1

 

C              1

 

So not just remove the duplicate (I know how to do that).

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

So you still want the records but the values to be erased?

 

You can set them to missing using CALL MISSING() and the same technique you would use to remove duplicates. 

 

data want;
set have;
by sample;

if not (first.sample and last.sample) then call missing(sample, v1);

run;

View solution in original post

7 REPLIES 7
Reeza
Super User

So you still want the records but the values to be erased?

 

You can set them to missing using CALL MISSING() and the same technique you would use to remove duplicates. 

 

data want;
set have;
by sample;

if not (first.sample and last.sample) then call missing(sample, v1);

run;
y_fu
Fluorite | Level 6

Thank you for the help, I want to remove them, not changing to missing calls.

Based on your code, I can just change 

if not (first.sample and last.sample) then call missing(sample, v1);

to

if not (first.sample and last.sample) then delete;

 

Works perfect, thanks again.

novinosrin
Tourmaline | Level 20

Are you looking to "remove" or assign missing values to the group that has more than one record?

Jagadishkatam
Amethyst | Level 16

 

data have;
input Sample:$ v1;
cards;  
A   1
B   1
B   2
C   1
;

proc sql;
create table want as select sample,sum(v1) as sum, v1 from have group by sample having sum=1;
quit;
Thanks,
Jag
novinosrin
Tourmaline | Level 20

Hi @Jagadishkatam I do like the sql approach possibly even extending the sort to be performed intrinsically at in database level however I'm afraid logic of your code will produce erroneous results in case of the just an addition of one more group D as illustrated below. On a sorted dataset, @Reeza solution tweaked by OP @y_fu is simplest

 

data have;
input Sample:$ v1;
cards;  
A   1
B   1
B   2
C   1
D   0
D   1
;
dm log 'clear';
proc sql;
create table want_yours as 
select sample,sum(v1) as sum, v1 
from have 
group by sample 
having sum=1;
quit;
data want_reeza_and_OP;
set have;
by sample;
if not (first.sample and last.sample) then delete;
run;

proc sql;
create table want_mine as 
select * 
from have 
group by sample 
having count(sample)=1;
quit;


My 2 cents for what it's worth & Regards!

 

 

novinosrin
Tourmaline | Level 20

More notes, Sample being sorted The performance of  data step approach should essentially beat the SQL in my opinion for the reason "remerging is an overhead or in other words an extra pass as I believe first and last pointers are quicker as opposed to group, count , remerge and filter in sql. 

 

novinosrin
Tourmaline | Level 20

And if your dataset isn't sorted, Hash is handy

 

Fun:

 

data have;
input Sample:$ v1;
cards;  
A   1
B   2
C   1
D   0
B   1
D   1
;
data _null_;
if _n_=1 then do;
dcl hash h(ordered:'y');
 h.definekey  ("sample") ;
 h.definedata ("sample","v1",'_N_') ;
 h.definedone () ;
end;
set have end=lr;
if h.check() ne 0 then do; _N_=1; h.replace();end;
else do;_N_=_N_+1; h.replace();end;
if lr then h.output(dataset:'want(where=(_N_=1))');
run;


 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1082 views
  • 7 likes
  • 4 in conversation