Solved: Re: Find if a group contains a value to create new variable (SAS 9.4)

putteringpluie · Posted 12-27-2017 02:13 PM

I want to find if a group has a certain value and delete the group if it does.

I have:

WB	abc
WB	asdf
WB	cwe
AP	sdf
AP	xxxxxx
AP	ter
TN	asdf
TN	ad
TN	sdr
TN	wer
OD	sser
OD	wry
OD	gyu
OD	xxxxxx

I want to find if a group has 'xxxxxx' vaue and delete the entire group. I want:

WB	abc
WB	asdf
WB	cwe
TN	asdf
TN	ad
TN	sdr
TN	wer

Please advice.

art297 · Posted 12-27-2017 03:38 PM

Two passes are needed in both the datastep and with proc sql, first to identify whether any record in a by group meets the condition, then a second to actually select the records.

The solution I suggested uses what is called a DOW loop. It takes over the normal record by record processing normally accomplished by SAS in a datastep.

The first part:

data want;
  do until (last.group);
    set have;
    by group notsorted;
    if value='xxxxxx' then drop=1;
  end;

goes through each by group, identifies if any of its records contain 'xxxxxx' for the variable value and, if at least one does, it sets the value of drop to equal 1.

The second part:

  do until (last.group);
    set have;
    by group notsorted;
    if not drop then output;
  end;
run;

is run as soon as the last record in a by group is read and only works on that by group. The value of drop is available throughout the processing of the by group. When it get to the last record of a by group it then passes control back to the first part of the DOW loop to begin processing the next group. When it does so, the value of drop is initially set to missing for the next by group.

A lot more is going on and can be done using a DOW loop. You can read about it at: support.sas.com/resources/papers/proceedings12/052-2012.pdf and support.sas.com/resources/papers/proceedings12/156-2012.pdf

I find the DOW loop to be more versatile than trying to do the same thing in SQL and it runs more than twice as fast as its SQL alternative.

Art, CEO, AnalystFinder.com

View solution in original post

Quentin · Posted 12-27-2017 02:17 PM

What code have you tried? Are you looking for a DATA step solution or also open to PROC SQL solution?

The Boston Area SAS Users Group is hosting free webinars!
Next up: Joe Madden & Joseph Henry present Putting Power into the Hands of the Programmer with SAS Viya Workbench on Wednesday Nov 6.
Register now at https://www.basug.org/events.

putteringpluie · Posted 12-27-2017 02:31 PM

I'm new to SAS as I'm coming from the world of stata. My logic in stata would be generate a tag of 1 if a group has the certain value. Use "egen" to create another variable which is the max value in a group. And delete if the group max value is 1. I don't know which function would generate the max value by group in SAS. Proc sql solution will also be helpful.

novinosrin · Posted 12-27-2017 02:36 PM

data have;
input grp $ value $;
datalines;
WB abc
WB asdf
WB cwe
AP sdf
AP xxxxxx
AP ter
TN asdf
TN ad
TN sdr
TN wer
OD sser
OD wry
OD gyu
OD xxxxxx
;

proc sql;
create table want(drop=v) as
select *, max(value = 'xxxxxx') as v
from have
group by grp
having v = 0;
quit;

novinosrin · Posted 12-27-2017 02:40 PM

This is cleaner than previous:

proc sql;
create table want as
select *
from have
group by grp
having max(value = 'xxxxxx')=0;
quit;

Quentin · Posted 12-27-2017 02:43 PM

This can be done with DATA step, but the PROC SQL solution is probably more straight forward:

data have;
  input group $2. value $10.;
  cards;
WB abc 
WB asdf 
WB cwe 
AP sdf 
AP xxxxxx 
AP ter 
TN asdf 
TN ad 
TN sdr 
TN wer 
OD sser 
OD wry 
OD gyu 
OD xxxxxx 
;

proc sql;
  create table want as 
  select *
  from have 
  group by group
  having max(value='xxxxxx')=0
  ;
quit;

On the having clause, SAS will evaluate value='xxxxxx' as either 1 (true) or 0 (false). You can use the aggregate MAX() function to find those records where the maximum value within the group is 0 (i.e. no records have a value of 'xxxxxx').

This works in PROC SQL (but would not work in most SQL implementations) because in SAS you are allowed to include non-aggregated variables on the select clause even though there is a group by clause. You will see a note in the log that SAS "remerged" the records.

The Boston Area SAS Users Group is hosting free webinars!
Next up: Joe Madden & Joseph Henry present Putting Power into the Hands of the Programmer with SAS Viya Workbench on Wednesday Nov 6.
Register now at https://www.basug.org/events.

art297 · Posted 12-27-2017 02:43 PM

Here is one way to do it using a datastep:

data have;
  infile cards dlm='09'x;
  input (group value) ($);
  cards;
WB	abc
WB	asdf
WB	cwe
AP	sdf
AP	xxxxxx
AP	ter
TN	asdf
TN	ad
TN	sdr
TN	wer
OD	sser
OD	wry
OD	gyu
OD	xxxxxx
;

data want;
  do until (last.group);
    set have;
    by group notsorted;
    if value='xxxxxx' then drop=1;
  end;
  do until (last.group);
    set have;
    by group notsorted;
    if not drop then output;
  end;
run;

Art, CEO, AnalystFinder.com

putteringpluie · Posted 12-27-2017 03:16 PM

You solution is interesting. Just so I better understand, can you please explain, how this is working and why we need two passes at data instead of one.

art297 · Posted 12-27-2017 03:38 PM

Two passes are needed in both the datastep and with proc sql, first to identify whether any record in a by group meets the condition, then a second to actually select the records.

The solution I suggested uses what is called a DOW loop. It takes over the normal record by record processing normally accomplished by SAS in a datastep.

The first part:

data want;
  do until (last.group);
    set have;
    by group notsorted;
    if value='xxxxxx' then drop=1;
  end;

goes through each by group, identifies if any of its records contain 'xxxxxx' for the variable value and, if at least one does, it sets the value of drop to equal 1.

The second part:

  do until (last.group);
    set have;
    by group notsorted;
    if not drop then output;
  end;
run;

is run as soon as the last record in a by group is read and only works on that by group. The value of drop is available throughout the processing of the by group. When it get to the last record of a by group it then passes control back to the first part of the DOW loop to begin processing the next group. When it does so, the value of drop is initially set to missing for the next by group.

A lot more is going on and can be done using a DOW loop. You can read about it at: support.sas.com/resources/papers/proceedings12/052-2012.pdf and support.sas.com/resources/papers/proceedings12/156-2012.pdf

I find the DOW loop to be more versatile than trying to do the same thing in SQL and it runs more than twice as fast as its SQL alternative.

Art, CEO, AnalystFinder.com

PeterClemmensen · Posted 12-27-2017 02:56 PM

data have;
  input group $2. value $10.;
  cards;
WB abc 
WB asdf 
WB cwe 
AP sdf 
AP xxxxxx 
AP ter 
TN asdf 
TN ad 
TN sdr 
TN wer 
OD sser 
OD wry 
OD gyu 
OD xxxxxx 
;

proc sql;
   create table want as
   select *
   from have
   where group not in
   (select distinct group from have where value="xxxxxx");
quit;

The DATA to DATA Step Macro
Blog: SASnrd

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away