BookmarkSubscribeRSS Feed
saikrishna786
Fluorite | Level 6

Hi I am using SAS Viya 4, 

 

I am getting this error when using proc sql with multiple joins in sas viya4.0

ERROR: The SAS System stopped processing this step because of insufficient memory.

 

Problem: I have 6 tables join in the proc sql, of which 1 table i am using libref to connect to oracle and rest of 5 tables i have loaded to CASLIB from oracle. So the join is happening with inmemory CAS tables and oracle table, since here in this case i cannot use fedsql, i am using proc sql and then i am getting this error.

Sample code:

libname test oracle path="********" schema="*****" authdomain="Oracle.*****" encoding='UTF8'; 

 

options casdatalimit=ALL;

PROC SQL ;
CREATE TABLE output1 AS
SELECT * FROM
test.table1 t1,  /* this is a full load oracle table with ~80gb size*/
caslib.table2 t3,
caslib.table2 t4,
caslib.table2 t5,
caslib.table2 t2,
caslib.table2 t6

WHERE 
(t1.column1 = t3.column1 
AND t1.column2 = t4.column2 
AND t1.column3 = t5.column3 
AND t5.column4 = t2.column4
AND t1.column5 = t6.column5) 
AND (t1.column5 = t1.column6 
AND t1.column7= 99991231 
AND t2.column7 NOT IN ('*******','********') 
AND t4.column8= '*****' 
AND t4.column9= '******' 
AND t6.column4= '*****')
 
GROUP BY t1.column1,
               t3.column1,
               t4.column1,
               t4.column2,
               t4.column3,
               t4.column4,
               t4.column5,
               t4.column6,
               t4.column7,
               t4.column8,
               t4.column2,
               t4.column2,
               t2.column2
      ORDER BY column1;
quit;

Kindly help.

 

Thanks

Sai Krishna

 

 

6 REPLIES 6
s_lassen
Meteorite | Level 14

There are some ways in which you can make your query more efficient:

  1. Instead of "select *", write a list of the columns that you actually need. This may take some time, but it will also make the code a lot easier to read and maintain, and it will also save memory for the output.
  2. You are not calculating any summarized columns, so I do not see why you have the "group by". This is probably an unnecessary waste of CPU and memory.

 

Kurt_Bremser
Super User

This looks like a multiple look-up, for which SQL is the worst tool in the SAS toolbox.

Your use of the asterisk also points to careless coding, which will at least lead to multiple WARNINGs about variables already present in the SELECT.

So if your caslib tables are unique with regard to their keys, and can fit into memory of the Base SAS or Viya process, a single DATA step with multiple hash objects will be the answer.

How many observations do the caslib tables have, and which variables (type, size) do they contain?

 

saikrishna786
Fluorite | Level 6

the select * is replaced with below query. Also  caslib tables are unique .

Sample code:

libname test oracle path="********" schema="*****" authdomain="Oracle.*****" encoding='UTF8'; 

 

options casdatalimit=ALL;

PROC SQL ;
CREATE TABLE output1 AS
SELECT 

t1.column1,
t2.column1,
t2.column2,
t2.column3,
t2.column4,
t2.column5,
t2.column6,
t2.column7,
t2.column8,
t2.column9,
t2.column11 AS column1name,
t2.column12,
t2.column13,
t2.column14,
t2.column15,
t3.column1,
t2.column1,
/**/
(SUM(t1.column1)) AS columnname 

FROM
test.table1 t1,  /* this is a full load oracle table with ~80gb size*/
caslib.table2 t3,
caslib.table2 t4,
caslib.table2 t5,
caslib.table2 t2,
caslib.table2 t6

WHERE 
(t1.column1 = t3.column1 
AND t1.column2 = t4.column2 
AND t1.column3 = t5.column3 
AND t5.column4 = t2.column4
AND t1.column5 = t6.column5) 
AND (t1.column5 = t1.column6 
AND t1.column7= 99991231 
AND t2.column7 NOT IN ('*******','********') 
AND t4.column8= '*****' 
AND t4.column9= '******' 
AND t6.column4= '*****')
 
GROUP BY t1.column1,
               t3.column1,
               t4.column1,
               t4.column2,
               t4.column3,
               t4.column4,
               t4.column5,
               t4.column6,
               t4.column7,
               t4.column8,
               t4.column2,
               t4.column2,
               t2.column2
      ORDER BY column1;

quit;

Sajid01
Meteorite | Level 14

Hello @saikrishna786 
The way SQL execute it follows the following order

1.From
3.Where
3.Group By
4.Having
5.Select
6.order By

Thus from pulls all data till where and join takes place. try rewriting your query may be creating intermediate datasets depending upon resources available to you.
A better way is to implement this query in oracle itself (SQL pass through).
In case you are not passing anything as a filter condition, it would be worthwhile to have your required table  exist as a view in oracle.
Do remember server resources are shared by many users concurrently and are not infinite.

SASKiwi
PROC Star

You are joining an Oracle table with CASLIB tables so SAS is having to read a complete copy of the Oracle table down to SAS before it can do the joining. This is likely contributing to your memory issues. Since you are only reading five columns from this table in the join, bring down only these columns first into a CASLIB table as a separate query first.

ballardw
Super User

You run out of memory because of the way you are doing

SELECT * FROM
test.table1 t1,  /* this is a full load oracle table with ~80gb size*/
caslib.table2 t3,
caslib.table2 t4,
caslib.table2 t5,
caslib.table2 t2,
caslib.table2 t6

That is called a Cartesian join. That means you take every record from table1 and combine every record in table2 (if table2 has 4 records then you are using 80gb*4 just for the table1 data now plus the size of table2) then take that minimum of 320gb(or whatever) and match every one of those records with every one in table2 (again????? which if still just 4 records is not 1280gb) , then the same multiple times.

Repeat. Any way, AFTER you fill up the hard drive with all of those combinations then the WHERE attempts to filter them down.

 

 

 

I really hope you were trying to hide your actual table names (silly, if your table names are that sensitive you shouldn't be asking any external help) and made some typos repeating Table2.

Otherwise

T3.column4 is also the same as T2.column4, T4.column4, T5.column4 andT6.column4 (and the same for every other column) as they are all the same caslib.table2

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1018 views
  • 1 like
  • 6 in conversation