BookmarkSubscribeRSS Feed
marleeakerson
Calcite | Level 5

Hello, 

 

I am trying to write a statement to include only certain variables. Take the sample data below: 

 

ID       Index

1             A

1             B

1             B

2             A

3             A

3             B

 

So in this instance, I am only trying to include the IDs when they only have an A Index, and no B Index. So using this sample data I would only want ID #2 because that is the only ID that doesn't also have a B Index. 

 

Does this make sense? How can I code for this? 

 

Thank you. 

 

 

6 REPLIES 6
Shmuel
Garnet | Level 18

proc sql;

    create table temp select distinct ID from have

    where index='B;

 

    create table want as select * from have

    where ID not in (select ID from temp);

quit;

ketpt42
Quartz | Level 8

This should be easy to accomplish with SQL. If your dataset is huge, there might be more efficient ways to do it.

 

data have;
    infile datalines;
    input ID :8. Index $1.;
datalines;
1 A
1 B
1 B
2 A
3 A
3 B
;
run;
 
proc sql;
create table want as
select id
  from have 
 where index='A'
   and id not in (SELECT id from have where Index = 'B')
 order by id;
quit;
ID
2
mkeintz
PROC Star

If the data are sorted by ID, then:

 

data have;
    infile datalines;
    input ID :8. Index $1.;
datalines;
1 A
1 B
1 B
2 A
3 A
3 B
run;

data want (drop=_:);
  set have (where=(index^='A'))
      have (where=(index='A'));
  by id;
  if first.id=1 and index='A' then _keep+1;
  else if first.id then _keep=0;
  if _keep;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PGStats
Opal | Level 21

Proc SQL with autoremerge provides this simple solution:

 

proc sql;
create table want as
select *
from have
group by ID
having sum(Index ne "A") = 0;
quit;
PG
Kurt_Bremser
Super User

The fastest method would be a data step, if your data is already sorted:

data want;
merge
  have
  have (
    in=delete
    keep=id index
    rename=(index=_i)
    where=(_i = 'B')
  )
;
by id;
if not delete;
drop _i;
run;

If it's not sorted, a hash table can do it:

data want;
set have;
if _N_ = 1
then do;
  declare hash h (dataset:"have (where=(index = 'B'))");
  h.definekey("id");
  h.definedone();
end;
if h.check() ne 0;
run;

The hash declaration will automatically discard duplicate entries for ID, so the in-memory table should be fairly small.

 

Both methods should outperform SQL if the table is quite large.

Ksharp
Super User
Same as PG's code:

data have;
infile datalines;
input ID :8. Index $1.;
datalines;
1 A
1 B
1 B
2 A
3 A
3 B
;
run;

proc sql;
create table want as
select * from have
group by id
having count(*)=sum(Index='A');
quit;
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1836 views
  • 4 likes
  • 7 in conversation