Hi!
Here the query I'm trying to run in SAS:
Proc SQL;
Create Table Final_Table As
Select
a.*,
b.Val
From WORK.Table1 As a
Left Join DDS.Table2 As b
On a.Val1 = b.Val1
Order by
a.Val,
b.Val;
Quit;
Table1 is quite small (less then 9000 rows), while Table2 is huge.
Thus JOIN takes a lot of time and even, at some point, its work stops because, according to the system, it takes too many resources (Sorry, I could not recreate the LOG, because the program didn't stop today).
Please, any suggestion how to improve the performance of the query so that it runs faster and consumes less resources.
UPDATE.
Here is the ERROR Log ():
ERROR: Error fetching from cursor. ORACLE error is ORA-12801: P017, instance exd8dbadm02.server.co.il:pstat_1 (1)
Error signal Parallel query server ORA-02393: CPU usage limit exceeded for calls. ERROR: PROC SQL runtime error for operation=sqxsrc. ERROR: An error has occurred.
What is the hit ratio on Table2?
If it's small, try to create an index on Table2.Val1.
The fastest is usually if proc SQL can performs a hash join, but unfortunately this doesn't happen with outer joins.
You could transform it to an inner join, and then maybe append unmatched rows in a second step if necessary?
You can increase the chance of a hash join level up the UBEFIZE (and have and index om the join column in the larger table.
Remeber to use the proc SQL options _method and _tree to deeper analyse the SQL behaviour.
@LinusH wrote:
What is the hit ratio on Table2?
If it's small, try to create an index on Table2.Val1.
The fastest is usually if proc SQL can performs a hash join, but unfortunately this doesn't happen with outer joins.
You could transform it to an inner join, and then maybe append unmatched rows in a second step if necessary?
You can increase the chance of a hash join level up the UBEFIZE (and have and index om the join column in the larger table.
Remeber to use the proc SQL options _method and _tree to deeper analyse the SQL behaviour.
Hi,
I'm sorry, but my knowledge of SAS is not so deep to understand you.
I can't create any index on Table2 because I haven't DBA permissions, all I can is query the table as is.
Hi,
Have you tried a data step approach? For example:
Alternatively:
Thanks & kind regards,
Amir.
The log is clearly not from the code you posted, as your code uses WORK datasets, while the ERROR message comes from an external database.
Please post the complete log including your real code.
@Kurt_Bremser wrote:
The log is clearly not from the code you posted, as your code uses WORK datasets, while the ERROR message comes from an external database.
Please post the complete log including your real code.
In the code I just adjusted names of Variables and of tables, to be more clear.
Table1 is a WORK Dataset, and Table2 is an external database.
My goal is to bring value from Table2 to Table1 dataset.
Where data resides is crucial.
If your Table1 resides in SAS, upload it to an Oracle temporary table.
Hopefully that would make the join happen in Oracle (using implicit pass.through).
Add
options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix;
to you program to detect how SAS and Oracle interacts.
Consider creating a WHERE condition out of the val1 values in your SAS dataset, and use that first to retrieve a small subset of the database table to SAS, with only the val1 and val variables.
For more detailed help, we need to know the attributes of val1.
As others already "hinted" if you join a small SAS table with a huge DB table (how many rows is huge for you?) then you need to push execution to the database side. To do so you need to push the SAS data required for the join to the DB side either by loading the SAS data into the database (ideally a temporary table) or then create on the SAS side a list of the key values in a SAS macro variable and use this macro variable for in a where clause with the IN operator.
The distinct key values of 9000 rows in your SAS table might or might not fit into a single SAS macro variable - plus there is often also a database limit of 32KB for code a single SQL query. If using this approach you might have to "slice" and generate multiple queries.
You write you shared with us your actual SQL. Do I get that right that the sole purpose of this query is to identify with rows in your SAS table have a matching Val1 value in your DB table? If so is Val1 in your DB table unique or could there be multiple rows that match?
Can you please provide a bit more info like:
Hi,
Since I can't edit and update my post anymore, I do it here.
Here is Actual code:
options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix;
Proc SQL;
Create Table Mispar_Girsat_Pol As
Select
a.*,
b.Ms_Girsa
From Data_Prep As a
Left Join dds.Versions As b
On a.l_pol_no = b.Ms_pol
Where
Datepart(Tr_Kovetz) > "01jan2023"d
Order by
a.l_pol_no,
b.Ms_Girsa;
Quit;
@Kurt_Bremser , I added "When " class to make the query "lighter", but it is not perfect cause it might cause missing data.
I will explain the purpose of the code.
I have a list of insurance policies (Data_Prep Table).
I need to match to every policy all it's versions , that data about the versions is stored in the table dds.Versions.
Table dds.Versions is huge because every two or tree months they add to this table new data that include information about every changes that was made in policies. The date of updating the Table is stored in variable "Tr_Kovetz".
If I choose specific Date in "Tr_Kovetz" I might miss information about versions numbers that are not included in this date, thus I need to get all versions from previous dates.
Variable "Ms_Girsa" is a number of policy version. To each policy there can be many versions.
But, this is only a start. I might to add another variables from dds.Versions.
@LinusH gave me a hint that i used and here is a LOG of a program (That succeeded, but it took too much time):
1 ;*';*";*/;quit;run; 2 OPTIONS PAGENO=MIN; 3 %LET _CLIENTTASKLABEL='Add_New_Params'; 4 %LET _CLIENTPROCESSFLOWNAME='New factors'; 5 %LET 5 ! _CLIENTPROJECTPATH='CLASSIFIED\ACTUARY\Igorr\PHI\Phi_In_Pay_ver7_new_tables_ 5 ! 062023.egp'; 6 %LET _CLIENTPROJECTPATHHOST='PC0692'; 7 %LET _CLIENTPROJECTNAME='Phi_In_Pay_ver7_new_tables_062023.egp'; 8 %LET _SASPROGRAMFILE=''; 9 %LET _SASPROGRAMFILEHOST=''; 10 11 ODS _ALL_ CLOSE; 12 OPTIONS DEV=SVG; 13 GOPTIONS XPIXELS=0 YPIXELS=0; 14 %macro HTML5AccessibleGraphSupported; 15 %if %_SAS_VERCOMP_FV(9,4,4, 0,0,0) >= 0 %then ACCESSIBLE_GRAPH; 16 %mend; 17 FILENAME EGHTML TEMP; 18 ODS HTML5(ID=EGHTML) FILE=EGHTML 19 OPTIONS(BITMAP_MODE='INLINE') 20 %HTML5AccessibleGraphSupported 21 ENCODING='utf-8' 22 STYLE=HTMLBlue 23 NOGTITLE 24 NOGFOOTNOTE 25 GPATH=&sasworklocation 26 ; NOTE: Writing HTML5(EGHTML) Body file: EGHTML 27 28 options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix; 29 30 Proc SQL; 31 Create Table Mispar_Girsat_Pol As 32 Select 33 a.*, 34 b.Ms_Girsa 35 From Data_Prep As a 36 Left Join dds.Versions As b 37 On a.l_pol_no = b.Ms_pol 38 Where 39 Datepart(Tr_Kovetz) > "01jan2023"d 40 Order by 41 a.l_pol_no, 42 b.Ms_Girsa; ORACLE_1: Prepared: on connection 5 SELECT * FROM DDS.VERSIONS ORACLE_2: Prepared: on connection 5 SELECT NULL as TABLE_QUALIFIER, t.owner as TABLE_OWNER, t.table_name as TABLE_NAME, t.num_rows as ROW_COUNT FROM all_tables t WHERE t.table_name='VERSIONS' AND t.owner='DDS' 2 The SAS System 12:30 Tuesday, September 19, 2023 ORACLE_3: Executed: on connection 5 SELECT statement ORACLE_2 NOTE: SAS threaded sort was used. ORACLE_4: Prepared: on connection 5 SELECT "MS_POL", "MS_GIRSA", "TR_KOVETZ" FROM DDS.VERSIONS ORACLE_5: Executed: on connection 5 SELECT statement ORACLE_4 NOTE: Table WORK.MISPAR_GIRSAT_POL created, with 821303 rows and 6 columns. 43 Quit; NOTE: PROCEDURE SQL used (Total process time): real time 2:08:05.03 cpu time 1:40:26.25 44 45 %LET _CLIENTTASKLABEL=; 46 %LET _CLIENTPROCESSFLOWNAME=; 47 %LET _CLIENTPROJECTPATH=; 48 %LET _CLIENTPROJECTPATHHOST=; 49 %LET _CLIENTPROJECTNAME=; 50 %LET _SASPROGRAMFILE=; 51 %LET _SASPROGRAMFILEHOST=; 52 53 ;*';*";*/;quit;run; 54 ODS _ALL_ CLOSE; 55 56 57 QUIT; RUN;
I hope this information will give you a hints.
Thanks in advance.
Things take so long because your current syntax transfers all rows from the DB to SAS prior to subsetting the rows.
What you need to do is write syntax that reduces the volume of data transferred between the database and SAS.
With the SAS table being the low volume you need to upload your SAS table to the DB (or at least the keys) and then only load the much smaller result set back to SAS.
To do so you need either to load the SAS data into a DB table (temporary or not) or you need to generate a SQL where clause (where key in (<list of values>).
The latter option would certainly require generation of multiple SQL statements to fit all the 9000 values.
The "next best option" would be to at least reduce the volume on the DB side prior to loading into SAS (=above select with some where clause in it, mainly on date and version).
Questions
1. What is the database (need to know for DB specific explicit SQL syntax)
2. Do you really need all versions or only the most recent one?
You wrote that you've got around 9000 rows in your SAS table. The result set you get back has 821303 rows which means there are around 90 versions per policy. Does that sound about right to you?
If you need multiple versions would there be some logic to at least drop some of the versions prior to loading to the SAS side?
And last but not least: You don't need to be a DBA to get Create/Drop table permissions. That's something SAS developers at many sites are been given. It doesn't need to be in the same database schema where your payload tables are stored. At least getting permissions to create temporary tables should be given. Where and how exactly will depend on the database (so... which database are you using).
...actually: The log you shared tells us that it's Oracle.
1. What is the database (need to know for DB specific explicit SQL syntax):
I don't know, and DBA doesn't tell be because of "security reason".
2. Do you really need all versions or only the most recent one?
Yes I do. I need all version, because as next step I need to merge another table that contains dates of every version (yes, unfortunately it stored in another table).
I have Date of insurance event, and for every policy I need to select version that was valid at date of insurance event. If the right version is missing I will have missing data for that policy.
I sounds like you're working with a DBA who doesn't understand what SAS is.
Just realized: The DB is Oracle. The log you've shared tells us. The SAS library definition would tell us even more (like the schema used). Just running libname dds list; would already provide more info.
Ok, let's assume you need to do it the more complicated way due to insufficient permissions and DBA support. Need a bit of time to "mock-up" some sample code.
Loading the SAS data first to the database for the join would allow for much simpler code than below.
What below does is generate SQL code where the keys from the SAS table get passed to the DB within the where clause as hardcoded values. Because this list of values wouldn't fit into a single SQL the code generates multiple SQL to query the DB multiple times and then just combines the data on the SAS side afterwards.
/** create sample data **/
options spool msglevel=i dlcreatedir sastrace=',,,d' sastraceloc=saslog nostsuffix;
libname dds "%sysfunc(pathname(work))/dds";
%let n_policies=18000;
data work.Data_Prep(drop=_:);
length l_pol_no $16;
do _i=1 to &n_policies by 2;
l_pol_no=put(_i,z16.);
output;
end;
run;
data dds.Versions(drop=_:);
length ms_pol $16 version 8 Tr_Kovetz Ms_Girsa 8;
format Tr_Kovetz datetime20.;
Ms_Girsa=_n_;
do _i=1 to &n_policies;
ms_pol=put(_i,z16.);
do version=rand('integer',1,200) to 1 by -1;
Tr_Kovetz=datetime()-86400*version;
output;
end;
end;
run;
/** load data into SAS reading from DB in slices **/
proc datasets lib=work nolist nowarn;
delete versions:;
run;quit;
filename codegen temp;
data _null_;
file codegen;
set work.Data_Prep(keep=l_pol_no) end=last;
length pol_list $32000;
retain pol_list n_vals;
if _n_=1 then
do;
slice=1;
n_vals=floor(32000/(vlength(l_pol_no)+3));
end;
pol_list=catx(',',pol_list,cats("'",l_pol_no,"'"));
if last or mod(_n_,n_vals) =0 then
do;
put
'proc sql; ' /
' create table work.Versions_' slice 'as ' /
' select ' /
' ms_pol ' /
' ,version ' /
' ,Tr_Kovetz ' /
' ,MS_Girsa ' /
' from dds.Versions ' /
' where ' /
' Tr_Kovetz >= "01jan2023 00:00:00"dt ' /
' and ' /
' ms_pol in (' pol_list ') ' /
' ; ' /
'quit; ' /
;
slice+1;
call missing(pol_list);
end;
run;
/*%include codegen / source2;*/
%include codegen / nosource2;
/** combine slices **/
data work.versions;
set work.versions_:;
run;
/** join as per OP but only with subset loaded from DB **/
Proc SQL;
Create Table Mispar_Girsat_Pol As
Select
a.*
,b.Ms_Girsa
,b.version
,b.Tr_Kovetz
From Data_Prep As a
Left Join WORK.Versions As b
On a.l_pol_no = b.Ms_pol
Where
Datepart(Tr_Kovetz) > "01jan2023"d
Order by
a.l_pol_no,
b.Ms_Girsa;
Quit;
For performance: If ms_pol needs to come first or second in the query depends on the indexes on your Oracle table. Given what you described I'm hoping your DBA partitioned the table by date (tr_kovetz) and that's why I have this variable first in the where clause.
I think it's better that you add more information as a reply in the thread, the it's easier to follow the discussion from top to bottom.
Have you tried to create a temporay table? That might work without the intervension af an DBA. If not, DBA shoudn't have a lot of concerns letting you do that.
SAS Help Center: Temporary Table Support for SAS/ACCESS
Another options is to try the DBKEY datset option:
SAS Help Center: DBKEY= Data Set Option
Not sure if it works with outer joins, but agian, since you have a relvitve small result set, you should be able to merge the unmatched records from your SAS work table quickly.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.