BookmarkSubscribeRSS Feed
marleeakerson
Calcite | Level 5

Hello, 

 

I am trying to write a statement to include only certain variables. Take the sample data below: 

 

ID       Index

1             A

1             B

1             B

2             A

3             A

3             B

 

So in this instance, I am only trying to include the IDs when they only have an A Index, and no B Index. So using this sample data I would only want ID #2 because that is the only ID that doesn't also have a B Index. 

 

Does this make sense? How can I code for this? 

 

Thank you. 

 

 

6 REPLIES 6
Shmuel
Garnet | Level 18

proc sql;

    create table temp select distinct ID from have

    where index='B;

 

    create table want as select * from have

    where ID not in (select ID from temp);

quit;

ketpt42
Quartz | Level 8

This should be easy to accomplish with SQL. If your dataset is huge, there might be more efficient ways to do it.

 

data have;
    infile datalines;
    input ID :8. Index $1.;
datalines;
1 A
1 B
1 B
2 A
3 A
3 B
;
run;
 
proc sql;
create table want as
select id
  from have 
 where index='A'
   and id not in (SELECT id from have where Index = 'B')
 order by id;
quit;
ID
2
mkeintz
PROC Star

If the data are sorted by ID, then:

 

data have;
    infile datalines;
    input ID :8. Index $1.;
datalines;
1 A
1 B
1 B
2 A
3 A
3 B
run;

data want (drop=_:);
  set have (where=(index^='A'))
      have (where=(index='A'));
  by id;
  if first.id=1 and index='A' then _keep+1;
  else if first.id then _keep=0;
  if _keep;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PGStats
Opal | Level 21

Proc SQL with autoremerge provides this simple solution:

 

proc sql;
create table want as
select *
from have
group by ID
having sum(Index ne "A") = 0;
quit;
PG
Kurt_Bremser
Super User

The fastest method would be a data step, if your data is already sorted:

data want;
merge
  have
  have (
    in=delete
    keep=id index
    rename=(index=_i)
    where=(_i = 'B')
  )
;
by id;
if not delete;
drop _i;
run;

If it's not sorted, a hash table can do it:

data want;
set have;
if _N_ = 1
then do;
  declare hash h (dataset:"have (where=(index = 'B'))");
  h.definekey("id");
  h.definedone();
end;
if h.check() ne 0;
run;

The hash declaration will automatically discard duplicate entries for ID, so the in-memory table should be fairly small.

 

Both methods should outperform SQL if the table is quite large.

Ksharp
Super User
Same as PG's code:

data have;
infile datalines;
input ID :8. Index $1.;
datalines;
1 A
1 B
1 B
2 A
3 A
3 B
;
run;

proc sql;
create table want as
select * from have
group by id
having count(*)=sum(Index='A');
quit;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 667 views
  • 4 likes
  • 7 in conversation