I want to find if a group has a certain value and delete the group if it does.
I have:
WB | abc |
WB | asdf |
WB | cwe |
AP | sdf |
AP | xxxxxx |
AP | ter |
TN | asdf |
TN | ad |
TN | sdr |
TN | wer |
OD | sser |
OD | wry |
OD | gyu |
OD | xxxxxx |
I want to find if a group has 'xxxxxx' vaue and delete the entire group. I want:
WB | abc |
WB | asdf |
WB | cwe |
TN | asdf |
TN | ad |
TN | sdr |
TN | wer |
Please advice.
Two passes are needed in both the datastep and with proc sql, first to identify whether any record in a by group meets the condition, then a second to actually select the records.
The solution I suggested uses what is called a DOW loop. It takes over the normal record by record processing normally accomplished by SAS in a datastep.
The first part:
data want; do until (last.group); set have; by group notsorted; if value='xxxxxx' then drop=1; end;
goes through each by group, identifies if any of its records contain 'xxxxxx' for the variable value and, if at least one does, it sets the value of drop to equal 1.
The second part:
do until (last.group); set have; by group notsorted; if not drop then output; end; run;
is run as soon as the last record in a by group is read and only works on that by group. The value of drop is available throughout the processing of the by group. When it get to the last record of a by group it then passes control back to the first part of the DOW loop to begin processing the next group. When it does so, the value of drop is initially set to missing for the next by group.
A lot more is going on and can be done using a DOW loop. You can read about it at: support.sas.com/resources/papers/proceedings12/052-2012.pdf and support.sas.com/resources/papers/proceedings12/156-2012.pdf
I find the DOW loop to be more versatile than trying to do the same thing in SQL and it runs more than twice as fast as its SQL alternative.
Art, CEO, AnalystFinder.com
data have;
input grp $ value $;
datalines;
WB abc
WB asdf
WB cwe
AP sdf
AP xxxxxx
AP ter
TN asdf
TN ad
TN sdr
TN wer
OD sser
OD wry
OD gyu
OD xxxxxx
;
proc sql;
create table want(drop=v) as
select *, max(value = 'xxxxxx') as v
from have
group by grp
having v = 0;
quit;
This is cleaner than previous:
proc sql;
create table want as
select *
from have
group by grp
having max(value = 'xxxxxx')=0;
quit;
This can be done with DATA step, but the PROC SQL solution is probably more straight forward:
data have;
input group $2. value $10.;
cards;
WB abc
WB asdf
WB cwe
AP sdf
AP xxxxxx
AP ter
TN asdf
TN ad
TN sdr
TN wer
OD sser
OD wry
OD gyu
OD xxxxxx
;
proc sql;
create table want as
select *
from have
group by group
having max(value='xxxxxx')=0
;
quit;
On the having clause, SAS will evaluate value='xxxxxx' as either 1 (true) or 0 (false). You can use the aggregate MAX() function to find those records where the maximum value within the group is 0 (i.e. no records have a value of 'xxxxxx').
This works in PROC SQL (but would not work in most SQL implementations) because in SAS you are allowed to include non-aggregated variables on the select clause even though there is a group by clause. You will see a note in the log that SAS "remerged" the records.
Here is one way to do it using a datastep:
data have; infile cards dlm='09'x; input (group value) ($); cards; WB abc WB asdf WB cwe AP sdf AP xxxxxx AP ter TN asdf TN ad TN sdr TN wer OD sser OD wry OD gyu OD xxxxxx ; data want; do until (last.group); set have; by group notsorted; if value='xxxxxx' then drop=1; end; do until (last.group); set have; by group notsorted; if not drop then output; end; run;
Art, CEO, AnalystFinder.com
You solution is interesting. Just so I better understand, can you please explain, how this is working and why we need two passes at data instead of one.
Two passes are needed in both the datastep and with proc sql, first to identify whether any record in a by group meets the condition, then a second to actually select the records.
The solution I suggested uses what is called a DOW loop. It takes over the normal record by record processing normally accomplished by SAS in a datastep.
The first part:
data want; do until (last.group); set have; by group notsorted; if value='xxxxxx' then drop=1; end;
goes through each by group, identifies if any of its records contain 'xxxxxx' for the variable value and, if at least one does, it sets the value of drop to equal 1.
The second part:
do until (last.group); set have; by group notsorted; if not drop then output; end; run;
is run as soon as the last record in a by group is read and only works on that by group. The value of drop is available throughout the processing of the by group. When it get to the last record of a by group it then passes control back to the first part of the DOW loop to begin processing the next group. When it does so, the value of drop is initially set to missing for the next by group.
A lot more is going on and can be done using a DOW loop. You can read about it at: support.sas.com/resources/papers/proceedings12/052-2012.pdf and support.sas.com/resources/papers/proceedings12/156-2012.pdf
I find the DOW loop to be more versatile than trying to do the same thing in SQL and it runs more than twice as fast as its SQL alternative.
Art, CEO, AnalystFinder.com
data have;
input group $2. value $10.;
cards;
WB abc
WB asdf
WB cwe
AP sdf
AP xxxxxx
AP ter
TN asdf
TN ad
TN sdr
TN wer
OD sser
OD wry
OD gyu
OD xxxxxx
;
proc sql;
create table want as
select *
from have
where group not in
(select distinct group from have where value="xxxxxx");
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.