BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
putteringpluie
Obsidian | Level 7

I want to find if a group has a certain value and delete the group if it does.

I have: 

WBabc
WBasdf
WBcwe
APsdf
APxxxxxx
APter
TNasdf
TNad
TNsdr
TNwer
ODsser
ODwry
ODgyu
ODxxxxxx

 

I want to find if a group has 'xxxxxx' vaue and delete the entire group. I want: 

WBabc
WBasdf
WBcwe
TNasdf
TNad
TNsdr
TNwer

 

Please advice. 

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Two passes are needed in both the datastep and with proc sql, first to identify whether any record in a by group meets the condition, then a second to actually select the records.

 

The solution I suggested uses what is called a DOW loop. It takes over the normal record by record processing normally accomplished by SAS in a datastep.

 

The first part:

data want;
  do until (last.group);
    set have;
    by group notsorted;
    if value='xxxxxx' then drop=1;
  end;

goes through each by group, identifies if any of its records contain 'xxxxxx' for the variable value and, if at least one does, it sets the value of drop to equal 1.

 

The second part:

  do until (last.group);
    set have;
    by group notsorted;
    if not drop then output;
  end;
run;

is run as soon as the last record in a by group is read and only works on that by group. The value of drop is available throughout the processing of the by group. When it get to the last record of a by group it then passes control back to the first part of the DOW loop to begin processing the next group. When it does so, the value of drop is initially set to missing for the next by group.

 

A lot more is going on and can be done using a DOW loop. You can read about it at: support.sas.com/resources/papers/proceedings12/052-2012.pdf and support.sas.com/resources/papers/proceedings12/156-2012.pdf

 

I find the DOW loop to be more versatile than trying to do the same thing in SQL and it runs more than twice as fast as its SQL alternative.

 

Art, CEO, AnalystFinder.com

 

View solution in original post

9 REPLIES 9
Quentin
Super User
What code have you tried? Are you looking for a DATA step solution or also open to PROC SQL solution?
The Boston Area SAS Users Group is hosting free webinars!
Next up: Joe Madden & Joseph Henry present Putting Power into the Hands of the Programmer with SAS Viya Workbench on Wednesday Nov 6.
Register now at https://www.basug.org/events.
putteringpluie
Obsidian | Level 7
I'm new to SAS as I'm coming from the world of stata. My logic in stata would be generate a tag of 1 if a group has the certain value. Use "egen" to create another variable which is the max value in a group. And delete if the group max value is 1. I don't know which function would generate the max value by group in SAS. Proc sql solution will also be helpful. 
novinosrin
Tourmaline | Level 20

data have;
input grp $ value $;
datalines;
WB abc
WB asdf
WB cwe
AP sdf
AP xxxxxx
AP ter
TN asdf
TN ad
TN sdr
TN wer
OD sser
OD wry
OD gyu
OD xxxxxx
;

proc sql;
create table want(drop=v) as
select *, max(value = 'xxxxxx') as v
from have
group by grp
having v = 0;
quit;

novinosrin
Tourmaline | Level 20

This is cleaner than previous:

 

proc sql;
create table want as
select *
from have
group by grp
having max(value = 'xxxxxx')=0;
quit;

Quentin
Super User

This can be done with DATA step, but the PROC SQL solution is probably more straight forward:

 

data have;
  input group $2. value $10.;
  cards;
WB abc 
WB asdf 
WB cwe 
AP sdf 
AP xxxxxx 
AP ter 
TN asdf 
TN ad 
TN sdr 
TN wer 
OD sser 
OD wry 
OD gyu 
OD xxxxxx 
;

proc sql;
  create table want as 
  select *
  from have 
  group by group
  having max(value='xxxxxx')=0
  ;
quit;

On the having clause, SAS will evaluate value='xxxxxx' as either 1 (true) or 0 (false).  You can use the aggregate MAX() function to find those records where the maximum value within the group is 0 (i.e. no records have a value of 'xxxxxx').

 

 

This works in PROC SQL (but would not work in most SQL implementations) because in SAS you are allowed to include non-aggregated variables on the select clause even though there is a  group by clause.  You will see a note in the log that SAS "remerged" the records.

The Boston Area SAS Users Group is hosting free webinars!
Next up: Joe Madden & Joseph Henry present Putting Power into the Hands of the Programmer with SAS Viya Workbench on Wednesday Nov 6.
Register now at https://www.basug.org/events.
art297
Opal | Level 21

Here is one way to do it using a datastep:

data have;
  infile cards dlm='09'x;
  input (group value) ($);
  cards;
WB	abc
WB	asdf
WB	cwe
AP	sdf
AP	xxxxxx
AP	ter
TN	asdf
TN	ad
TN	sdr
TN	wer
OD	sser
OD	wry
OD	gyu
OD	xxxxxx
;

data want;
  do until (last.group);
    set have;
    by group notsorted;
    if value='xxxxxx' then drop=1;
  end;
  do until (last.group);
    set have;
    by group notsorted;
    if not drop then output;
  end;
run;

Art, CEO, AnalystFinder.com

 

putteringpluie
Obsidian | Level 7

You solution is interesting. Just so I better understand, can you please explain, how this is working and why we need two passes at data instead of one. 

art297
Opal | Level 21

Two passes are needed in both the datastep and with proc sql, first to identify whether any record in a by group meets the condition, then a second to actually select the records.

 

The solution I suggested uses what is called a DOW loop. It takes over the normal record by record processing normally accomplished by SAS in a datastep.

 

The first part:

data want;
  do until (last.group);
    set have;
    by group notsorted;
    if value='xxxxxx' then drop=1;
  end;

goes through each by group, identifies if any of its records contain 'xxxxxx' for the variable value and, if at least one does, it sets the value of drop to equal 1.

 

The second part:

  do until (last.group);
    set have;
    by group notsorted;
    if not drop then output;
  end;
run;

is run as soon as the last record in a by group is read and only works on that by group. The value of drop is available throughout the processing of the by group. When it get to the last record of a by group it then passes control back to the first part of the DOW loop to begin processing the next group. When it does so, the value of drop is initially set to missing for the next by group.

 

A lot more is going on and can be done using a DOW loop. You can read about it at: support.sas.com/resources/papers/proceedings12/052-2012.pdf and support.sas.com/resources/papers/proceedings12/156-2012.pdf

 

I find the DOW loop to be more versatile than trying to do the same thing in SQL and it runs more than twice as fast as its SQL alternative.

 

Art, CEO, AnalystFinder.com

 

PeterClemmensen
Tourmaline | Level 20
data have;
  input group $2. value $10.;
  cards;
WB abc 
WB asdf 
WB cwe 
AP sdf 
AP xxxxxx 
AP ter 
TN asdf 
TN ad 
TN sdr 
TN wer 
OD sser 
OD wry 
OD gyu 
OD xxxxxx 
;

proc sql;
   create table want as
   select *
   from have
   where group not in
   (select distinct group from have where value="xxxxxx");
quit;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 7705 views
  • 4 likes
  • 5 in conversation