Re: Help with WHERE statement

marleeakerson · Posted 07-13-2020 07:24 PM

Hello,

I am trying to write a statement to include only certain variables. Take the sample data below:

ID Index

1 A

1 B

2 A

3 A

3 B

So in this instance, I am only trying to include the IDs when they only have an A Index, and no B Index. So using this sample data I would only want ID #2 because that is the only ID that doesn't also have a B Index.

Does this make sense? How can I code for this?

Thank you.

Shmuel · Posted 07-13-2020 07:51 PM

proc sql;

create table temp select distinct ID from have

where index='B;

create table want as select * from have

where ID not in (select ID from temp);

quit;

ketpt42 · Posted 07-13-2020 07:54 PM

This should be easy to accomplish with SQL. If your dataset is huge, there might be more efficient ways to do it.

data have;
    infile datalines;
    input ID :8. Index $1.;
datalines;
1 A
1 B
1 B
2 A
3 A
3 B
;
run;
 
proc sql;
create table want as
select id
  from have 
 where index='A'
   and id not in (SELECT id from have where Index = 'B')
 order by id;
quit;

ID
2

mkeintz · Posted 07-13-2020 09:33 PM

If the data are sorted by ID, then:

data have;
    infile datalines;
    input ID :8. Index $1.;
datalines;
1 A
1 B
1 B
2 A
3 A
3 B
run;

data want (drop=_:);
  set have (where=(index^='A'))
      have (where=(index='A'));
  by id;
  if first.id=1 and index='A' then _keep+1;
  else if first.id then _keep=0;
  if _keep;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

PGStats · Posted 07-13-2020 09:50 PM

Proc SQL with autoremerge provides this simple solution:

proc sql;
create table want as
select *
from have
group by ID
having sum(Index ne "A") = 0;
quit;

PG

Kurt_Bremser · Posted 07-14-2020 02:20 AM

The fastest method would be a data step, if your data is already sorted:

data want;
merge
  have
  have (
    in=delete
    keep=id index
    rename=(index=_i)
    where=(_i = 'B')
  )
;
by id;
if not delete;
drop _i;
run;

If it's not sorted, a hash table can do it:

data want;
set have;
if _N_ = 1
then do;
  declare hash h (dataset:"have (where=(index = 'B'))");
  h.definekey("id");
  h.definedone();
end;
if h.check() ne 0;
run;

The hash declaration will automatically discard duplicate entries for ID, so the in-memory table should be fairly small.

Both methods should outperform SQL if the table is quite large.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 07-14-2020 07:59 AM

Same as PG's code:

data have;
infile datalines;
input ID :8. Index $1.;
datalines;
1 A
1 B
1 B
2 A
3 A
3 B
;
run;

proc sql;
create table want as
select * from have
group by id
having count(*)=sum(Index='A');
quit;

Classroom Training Available!