Subquery

Miracle · Posted 02-05-2021 04:57 AM

Dear All,

How do we write the proc sql to retrieve only the observation rows when the same ID and the same date are present in both the data please?

Thanking you all in advance.

PeterClemmensen · Posted 02-05-2021 05:08 AM

What do you mean by "both the data" ?

Please be more specific and show us your data.

The DATA to DATA Step Macro
Blog: SASnrd

Kurt_Bremser · Posted 02-05-2021 05:10 AM

proc sql;
create table want as
  select t1.*
  from t1
  where id in (select distinct id from t2)
;
quit;

Replace table and column names as needed

Depending on dataset size, a DATA step with hash approach might perform better.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PGStats · Posted 02-05-2021 11:46 AM

You can't use the IN operator for subsetting on two variables, but you can use the EXISTS operator:

proc sql;
create table want as
  select *
  from t1 as a
  where exists (select * from t2 where id=a.id and date=a.date);
quit;

PG

Miracle · Posted 02-07-2021 09:29 AM

Thanks @PeterClemmensen , @Kurt_Bremser and @PGStats for your response.
I tried the proc sql as you advised but the code never stop running even for an entire day.
I wonder if it is due to the huge datasets i.e 10000k rows of data.
What other ways would you suggest please?
I am completely new to hash object so I don't understand how to write the syntax.
Thanking you in advance.

Kurt_Bremser · Posted 02-07-2021 07:43 PM

Then a hash approach should be tried.

Which would look like this:

data want;
set t1;
if _n_ = 1
then do;
  declare hash t2 (dataset:("t2");
  t2.definekey("id","date");
  t2.definedone();
end;
if t2.check() = 0;
run;

This will work as long as the two variables * observations from t2 fit into memory.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Tom · Posted 02-07-2021 01:57 PM

@Miracle wrote:

Dear All,

How do we write the proc sql to retrieve only the observation rows when the same ID and the same date are present in both the data please?

Thanking you all in advance.

Why would you use SQL for such a simple request?

data want ;
  merge dataset1 (in=in1)  dataset2(in=in2);
  by id date;
  if in1 and in2;
run;

Subquery

Re: Subquery

Re: Subquery

Re: Subquery

Re: Subquery

Re: Subquery

Re: Subquery

Subquery

Re: Subquery

Re: Subquery

Re: Subquery

Re: Subquery

Re: Subquery

Re: Subquery

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away