BookmarkSubscribeRSS Feed
irinaia
Obsidian | Level 7
Hi

I just want to clarify in what order given steps will be executed in proc sql.

Let us say we have table A 91 000 000 rows and table B with 2 900 000.

We want to left join these table on id and using where statement filter out based on a condition.

proc sql;
create table test as
select a.*
from a
left join b on a.id=b.id
where a.status=1;

Will tables be joined first and then filter in where close will be applied?

Or are we reading row by row from table a, checking condition and if it met run left join?

What is the best way to join such a big tables?
9 REPLIES 9
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Please use the code window, its the {i} for posting code.  Your code doesn't make sense, you never use table b, so the statement is:

proc sql;
  create table test as
  select a.*
  from a a
  where status=1;
quit;

That being said, its quite a lot of observations, so you may get better performance using a datastep (assumes they are sorted):

data test;
  merge a (where=(status=1) in=a)
        b (in=b);
  by id;
  if a;
run;

Oh, and that where a.status will be executed after merge, you can subquery, eg.:

proc sql;
  create table test as
  select a.*
  from (select * from a where status=1) a
left join b b
on a.id=b.id; quit;
irinaia
Obsidian | Level 7

This is not a real code and the aim of it was just to show in what order statements are written.

What i want to know is in what order queries like this will be executed.

proc sql;
create table test as
select a.*, b.status
from a
left join b b
on a.id=b.id;
where current=1 and sum>0
quit;

 


Will SAS read row by row from database, check if it satisfy the condition in where close and then go through another table ,b-table, check if the row with same id exist there and then join.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

So it is being executed on a database now?  I am afraid I cannot hypothetical questions.  If the SQL is being run within SAS then you can see the process using 

proc sql _tree _method;

Options, that may help you.  If its run on a database then you would need to check the SQL on the database itself.

irinaia
Obsidian | Level 7
Do you mean that the whole sql querie against RDBMS will be always executed on SQL side and SAS will only get returned the final result or it will return row by row? I just want to understand where query will be executed, in what order and so on.
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Again, that depends on several different factors.  If you use explicit pass-through, then all the SQL in there gets passed to the database for execution and results returned.  

If your using a libname setup like:

http://support.sas.com/documentation/cdl/en/acreldb/65247/HTML/default/viewer.htm#p1a9wh7qwv6vc9n18a...

Then the SQL is generated internally and sent to the DB for results to be passed back.

kiranv_
Rhodochrosite | Level 12

if the data in rdbms, which ebdms is it?

irinaia
Obsidian | Level 7

MS SQL Server

kiranv_
Rhodochrosite | Level 12

you can run showplan and you should try to run in sql server side itself. check whether your datasets have indexes. if you have access to sql server maangement studio and then you can see execution plan. it gives a very detailed info of how much time your query takes and which step it is taking time.

singhsahab
Lapis Lazuli | Level 10

Hi  ,

 

If there is no memory concern , kindly use  hash object to join ...

 

here is a sample code.

 

 

data A;
input id status;
cards;
1 1
2 1
3 .
4 1
5 1
6 1
;
run;


data b;
input id;
cards;
1
2
3
5
;
run;

DATA WANT;
IF _N_=1 THEN DO;
DECLARE HASh H(DATASET:'B');
 H.DEFINEKEY('ID');
 H.DEFINEDATA('ID');
 H.DEFINEDONE();
END;
 DO UNTIL(LR);
  SET A END=LR;
  WHERE STATUS=1;
  ARRAY ARR ID;
  DO OVER ARR;
   r=H.find(key:arr);
   if r ge 0 THEN OUTPUT;
  END;
 END;
 drop r;
RUN;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1121 views
  • 0 likes
  • 4 in conversation