Solved: Compare variables from tables and only take occurring ones

Jay_Aguilar · Posted 04-01-2020 11:02 AM

Hello everyone,

I have a list which looks like this (table 1):

ID	information
1	abc
2	def
3	ghi

I now want to get data from another table (table 2) which has many more ID's (e.g. ID 1 to 10) but I only want those datasets which have the ID defined in my first table (table 1). I do not want to indicate which ID's to take via a WHERE statement and write every ID which should be taken but somehow make it more flexible. So if at one point in time I need to add an ID to my first table (table 1), it automatically takes the added ID into account when getting data from my second table (table 2). I guess I am looking for something like the VLOOKUP in excel, just that I do not want to join the the tables.

I hope I could make clear what I need and would be very happy if someone could help me with that.

Thank you.

Tom · Posted 04-01-2020 11:18 AM

Are you asking to make a new dataset that is a subset of an existing dataset?

Let's call you list of id table as LIST, the existing dataset has HAVE and the desired result as WANT.

You can use data step with MERGE.

data want;
  merge have list(in=inlist);
  by id;
  if inlist;
run;

Note this requires that both dataset as sorted.

You can use an SQL query.

proc sql;
create table want as
select * from have
where id in (select id from list)
;
quit;

Either of these can be created as views instead of tables if you the results to automatically reflect changes to HAVE and LIST.

View solution in original post

Tom · Posted 04-01-2020 11:18 AM

Are you asking to make a new dataset that is a subset of an existing dataset?

Let's call you list of id table as LIST, the existing dataset has HAVE and the desired result as WANT.

You can use data step with MERGE.

data want;
  merge have list(in=inlist);
  by id;
  if inlist;
run;

Note this requires that both dataset as sorted.

You can use an SQL query.

proc sql;
create table want as
select * from have
where id in (select id from list)
;
quit;

Either of these can be created as views instead of tables if you the results to automatically reflect changes to HAVE and LIST.

Jay_Aguilar · Posted 04-01-2020 11:36 AM

Thank you very much fot the quick response!

Yes, I have to create a new dataset.

Kurt_Bremser · Posted 04-01-2020 11:33 AM

A more modern approach to do a lookup is the hash object:

data want;
set have;
if _n_ = 1
then do;
  declare hash lookup (dataset:"list (keep=id)");
  lookup.definekey("id");
  lookup.definedone();
end;
if lookup.find() = 0;
run;

Note that this needs to be re-run anytime one of the input datasets changes (a view would not need this), but if the result is to be used multiple times, it will provide better performance.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Compare variables from tables and only take occurring ones

Re: Compare variables from tables and only take occurring ones

Re: Compare variables from tables and only take occurring ones

Re: Compare variables from tables and only take occurring ones

Re: Compare variables from tables and only take occurring ones

Compare variables from tables and only take occurring ones

Re: Compare variables from tables and only take occurring ones

Re: Compare variables from tables and only take occurring ones

Re: Compare variables from tables and only take occurring ones

Re: Compare variables from tables and only take occurring ones

Registration is open

SAS Training: Just a Click Away