BookmarkSubscribeRSS Feed
IgorR
Quartz | Level 8

Hi!

Here the query I'm trying to run in SAS:

Proc SQL;
Create Table Final_Table As
Select
	a.*,
	b.Val
From WORK.Table1 As a
Left Join DDS.Table2 As b
On a.Val1 = b.Val1
Order by
	a.Val,
	b.Val;
Quit;

Table1 is quite small (less then 9000 rows), while Table2 is huge.
Thus JOIN takes a lot of time and even, at some point, its work stops because, according to the system, it takes too many resources (Sorry, I could not recreate the LOG, because the program didn't stop today).

Please, any suggestion how to improve the performance of the query so that it runs faster and consumes less resources.

 

UPDATE.

Here is the ERROR Log ():

ERROR: Error fetching from cursor. ORACLE error is ORA-12801: P017, instance exd8dbadm02.server.co.il:pstat_1 (1) 
Error signal Parallel query server ORA-02393: CPU usage limit exceeded for calls. ERROR: PROC SQL runtime error for operation=sqxsrc. ERROR: An error has occurred.
22 REPLIES 22
LinusH
Tourmaline | Level 20

What is the hit ratio on Table2?

If it's small, try to create an index on Table2.Val1.

The fastest is usually if proc SQL can performs a hash join, but unfortunately this doesn't happen with outer joins.

You could transform it to an inner join, and then maybe append unmatched rows in a second step if necessary?

You can increase the chance of a hash join level up the UBEFIZE (and have and index om the join column in the larger table.

Remeber to use the proc SQL options _method and _tree to deeper analyse the SQL behaviour.

Data never sleeps
IgorR
Quartz | Level 8

@LinusH wrote:

What is the hit ratio on Table2?

If it's small, try to create an index on Table2.Val1.

The fastest is usually if proc SQL can performs a hash join, but unfortunately this doesn't happen with outer joins.

You could transform it to an inner join, and then maybe append unmatched rows in a second step if necessary?

You can increase the chance of a hash join level up the UBEFIZE (and have and index om the join column in the larger table.

Remeber to use the proc SQL options _method and _tree to deeper analyse the SQL behaviour.


Hi,

I'm sorry, but my knowledge of SAS is not so deep to understand you.

I can't create any index on Table2 because I haven't DBA permissions, all I can is query the table as is.

Amir
PROC Star

Hi,

 

Have you tried a data step approach? For example:

 

  1. If the data is not sorted in the desired order then use proc sort table1 and table2(keep = val val1).
  2. Merge the sorted results, outputting observations if they are in both data sets.

 

Alternatively:

 

  1. A data step with a hash lookup could be used where table1 val1 would be key, returning data val.
  2. Only output data where the key value is in both tables.

 

 

Thanks & kind regards,

Amir.

Kurt_Bremser
Super User

The log is clearly not from the code you posted, as your code uses WORK datasets, while the ERROR message comes from an external database.

Please post the complete log including your real code.

IgorR
Quartz | Level 8

@Kurt_Bremser wrote:

The log is clearly not from the code you posted, as your code uses WORK datasets, while the ERROR message comes from an external database.

Please post the complete log including your real code.


In the code I just adjusted names of Variables and of tables, to be more clear.

Table1 is a WORK Dataset, and Table2 is an external database.

My goal is to bring value from Table2 to Table1 dataset.

LinusH
Tourmaline | Level 20

Where data resides is crucial.

If your Table1 resides in SAS, upload it to an Oracle temporary table.

Hopefully that would make the join happen in Oracle (using implicit pass.through).

Add

options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix;

to you program to detect how SAS and Oracle interacts.

 

Data never sleeps
Kurt_Bremser
Super User

Consider creating a WHERE condition out of the val1 values in your SAS dataset, and use that first to retrieve a small subset of the database table to SAS, with only the val1 and val variables.

For more detailed help, we need to know the attributes of val1.

Patrick
Opal | Level 21

As others already "hinted" if you join a small SAS table with a huge DB table (how many rows is huge for you?) then you need to push execution to the database side. To do so you need to push the SAS data required for the join to the DB side either by loading the SAS data into the database (ideally a temporary table) or then create on the SAS side a list of the key values in a SAS macro variable and use this macro variable for in a where clause with the IN operator.

 

The distinct key values of 9000 rows in your SAS table might or might not fit into a single SAS macro variable - plus there is often also a database limit of 32KB for code a single SQL query. If using this approach you might have to "slice" and generate multiple queries.

 

You write you shared with us your actual SQL. Do I get that right that the sole purpose of this query is to identify with rows in your SAS table have a matching Val1 value in your DB table? If so is Val1 in your DB table unique or could there be multiple rows that match?

 

Can you please provide a bit more info like:

  • type and length of variable Val1
  • number of distinct Val1 values in your SAS table
  • number of rows in your DB table
  • Is Val1 on the DB side unique or can there be multiple rows with the same value?
  • What is the database - Oracle, SQL Server, ...?
IgorR
Quartz | Level 8

Hi,

Since I can't edit and update my post anymore, I do it here.

Here is Actual code:

options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix;

Proc SQL; Create Table Mispar_Girsat_Pol As Select a.*, b.Ms_Girsa From Data_Prep As a Left Join dds.Versions As b On a.l_pol_no = b.Ms_pol Where Datepart(Tr_Kovetz) > "01jan2023"d Order by a.l_pol_no, b.Ms_Girsa; Quit;

@Kurt_Bremser , I added "When " class to make the query "lighter", but it is not perfect cause it might cause missing data.

I will explain the purpose of the code.

I have a list of insurance policies (Data_Prep Table).

I need to match to every policy all it's versions , that data about the versions is stored in the table dds.Versions.

Table dds.Versions is huge because every two or tree months they add to this table new data that include information about every changes that was made in policies. The date of updating the Table is stored in variable "Tr_Kovetz".

If I choose specific Date in "Tr_Kovetz" I might miss information about versions numbers that are not included in this date, thus I need to get all versions from previous dates. 

Variable "Ms_Girsa" is a number of policy version. To each policy there can be many versions.

But, this is only a start. I might to add another variables from dds.Versions.

@LinusH gave me a hint that i used and here is a LOG of a program (That succeeded, but it took too much time):

1          ;*';*";*/;quit;run;
2          OPTIONS PAGENO=MIN;
3          %LET _CLIENTTASKLABEL='Add_New_Params';
4          %LET _CLIENTPROCESSFLOWNAME='New factors';
5          %LET
5        ! _CLIENTPROJECTPATH='CLASSIFIED\ACTUARY\Igorr\PHI\Phi_In_Pay_ver7_new_tables_
5        ! 062023.egp';
6          %LET _CLIENTPROJECTPATHHOST='PC0692';
7          %LET _CLIENTPROJECTNAME='Phi_In_Pay_ver7_new_tables_062023.egp';
8          %LET _SASPROGRAMFILE='';
9          %LET _SASPROGRAMFILEHOST='';
10         
11         ODS _ALL_ CLOSE;
12         OPTIONS DEV=SVG;
13         GOPTIONS XPIXELS=0 YPIXELS=0;
14         %macro HTML5AccessibleGraphSupported;
15             %if %_SAS_VERCOMP_FV(9,4,4, 0,0,0) >= 0 %then ACCESSIBLE_GRAPH;
16         %mend;
17         FILENAME EGHTML TEMP;
18         ODS HTML5(ID=EGHTML) FILE=EGHTML
19             OPTIONS(BITMAP_MODE='INLINE')
20             %HTML5AccessibleGraphSupported
21             ENCODING='utf-8'
22             STYLE=HTMLBlue
23             NOGTITLE
24             NOGFOOTNOTE
25             GPATH=&sasworklocation
26         ;
NOTE: Writing HTML5(EGHTML) Body file: EGHTML
27         
28         options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix;
29         
30         Proc SQL;
31         Create Table Mispar_Girsat_Pol As
32         Select
33         	a.*,
34         	b.Ms_Girsa
35         From Data_Prep As a
36         Left Join dds.Versions As b
37         On a.l_pol_no = b.Ms_pol
38         Where
39         	Datepart(Tr_Kovetz) > "01jan2023"d
40         Order by
41         	a.l_pol_no,
42         	b.Ms_Girsa;
 
ORACLE_1: Prepared: on connection 5
SELECT * FROM DDS.VERSIONS
  
ORACLE_2: Prepared: on connection 5
SELECT  NULL as TABLE_QUALIFIER,  t.owner as TABLE_OWNER,  t.table_name as TABLE_NAME,  t.num_rows as ROW_COUNT  FROM  all_tables t 
WHERE  t.table_name='VERSIONS' AND  t.owner='DDS' 
2                                                          The SAS System                          12:30 Tuesday, September 19, 2023

 
ORACLE_3: Executed: on connection 5
SELECT statement  ORACLE_2
 
NOTE: SAS threaded sort was used.
 
ORACLE_4: Prepared: on connection 5
SELECT  "MS_POL", "MS_GIRSA", "TR_KOVETZ" FROM DDS.VERSIONS 
  
ORACLE_5: Executed: on connection 5
SELECT statement  ORACLE_4
 
NOTE: Table WORK.MISPAR_GIRSAT_POL created, with 821303 rows and 6 columns.

43         Quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           2:08:05.03
      cpu time            1:40:26.25
     

44         
45         %LET _CLIENTTASKLABEL=;
46         %LET _CLIENTPROCESSFLOWNAME=;
47         %LET _CLIENTPROJECTPATH=;
48         %LET _CLIENTPROJECTPATHHOST=;
49         %LET _CLIENTPROJECTNAME=;
50         %LET _SASPROGRAMFILE=;
51         %LET _SASPROGRAMFILEHOST=;
52         
53         ;*';*";*/;quit;run;
54         ODS _ALL_ CLOSE;
55         
56         
57         QUIT; RUN;

I hope this information will give you a hints.

Thanks in advance.

Patrick
Opal | Level 21

Things take so long because your current syntax transfers all rows from the DB to SAS prior to subsetting the rows.

Patrick_0-1695276838856.png

What you need to do is write syntax that reduces the volume of data transferred between the database and SAS. 

With the SAS table being the low volume you need to upload your SAS table to the DB (or at least the keys) and then only load the much smaller result set back to SAS.

To do so you need either to load the SAS data into a DB table (temporary or not) or you need to generate a SQL where clause (where key in (<list of values>).

The latter option would certainly require generation of multiple SQL statements to fit all the 9000 values.

 

The "next best option" would be to at least reduce the volume on the DB side prior to loading into SAS (=above select with some where clause in it, mainly on date and version). 

 

Questions

1. What is the database (need to know for DB specific explicit SQL syntax)

2. Do you really need all versions or only the most recent one?

 

You wrote that you've got around 9000 rows in your SAS table. The result set you get back has 821303 rows which means there are around 90 versions per policy. Does that sound about right to you?
If you need multiple versions would there be some logic to at least drop some of the versions prior to loading to the SAS side?

 

And last but not least: You don't need to be a DBA to get Create/Drop table permissions. That's something SAS developers at many sites are been given. It doesn't need to be in the same database schema where your payload tables are stored. At least getting permissions to create temporary tables should be given. Where and how exactly will depend on the database (so... which database are you using).

 

...actually: The log you shared tells us that it's Oracle.

 

IgorR
Quartz | Level 8

1. What is the database (need to know for DB specific explicit SQL syntax):

I don't know, and DBA doesn't tell be because of "security reason".

 

2. Do you really need all versions or only the most recent one?

Yes I do. I need all version, because as next step I need to merge another table that contains dates of every version (yes, unfortunately it stored in another table). 

I have Date of insurance event, and for every policy I need to select version that was valid at  date of insurance event. If the right version is missing I will have missing data for that policy.

Patrick
Opal | Level 21

I sounds like you're working with a DBA who doesn't understand what SAS is.

Just realized: The DB is Oracle. The log you've shared tells us. The SAS library definition would tell us even more (like the schema used). Just running libname dds list; would already provide more info.

 

Ok, let's assume you need to do it the more complicated way due to insufficient permissions and DBA support. Need a bit of time to "mock-up" some sample code.

Patrick
Opal | Level 21

Loading the SAS data first to the database for the join would allow for much simpler code than below.

What below does is generate SQL code where the keys from the SAS table get passed to the DB within the where clause as hardcoded values. Because this list of values wouldn't fit into a single SQL the code generates multiple SQL to query the DB multiple times and then just combines the data on the SAS side afterwards.

/** create sample data **/
options spool msglevel=i dlcreatedir sastrace=',,,d' sastraceloc=saslog nostsuffix;
libname dds "%sysfunc(pathname(work))/dds";
%let n_policies=18000;

data work.Data_Prep(drop=_:);
  length l_pol_no $16;
  do _i=1 to &n_policies by 2;
    l_pol_no=put(_i,z16.);
    output;
  end;
run;

data dds.Versions(drop=_:);
  length ms_pol $16 version 8 Tr_Kovetz Ms_Girsa 8;
  format Tr_Kovetz datetime20.;
  Ms_Girsa=_n_;
  do _i=1 to &n_policies;
    ms_pol=put(_i,z16.);
    do version=rand('integer',1,200) to 1 by -1;
      Tr_Kovetz=datetime()-86400*version;
      output;
    end;
  end;
run;

/** load data into SAS reading from DB in slices **/
proc datasets lib=work nolist nowarn;
  delete versions:;
run;quit;

filename codegen temp;
data _null_;
  file codegen;
  set work.Data_Prep(keep=l_pol_no) end=last;
  length pol_list $32000;
  retain pol_list n_vals;
  if _n_=1 then
    do;
      slice=1;
      n_vals=floor(32000/(vlength(l_pol_no)+3));
    end;

  pol_list=catx(',',pol_list,cats("'",l_pol_no,"'"));
  
  if last or mod(_n_,n_vals) =0 then
    do;
      put
        'proc sql;                                  ' /
        '  create table work.Versions_' slice 'as   ' /
        '    select                                 ' /
        '      ms_pol                               ' /
        '      ,version                             ' /
        '      ,Tr_Kovetz                           ' /
        '      ,MS_Girsa                            ' /
        '    from dds.Versions                      ' /
        '    where                                  ' /
        '      Tr_Kovetz >= "01jan2023 00:00:00"dt  ' /
        '      and                                  ' /
        '      ms_pol in (' pol_list ')            ' /
        '    ;                                      ' /
        'quit;                                      ' /
        ;

      slice+1;
      call missing(pol_list);

    end;
run;
/*%include codegen / source2;*/
%include codegen / nosource2;

/** combine slices **/
data work.versions;
  set work.versions_:;
run;

/** join as per OP but only with subset loaded from DB **/
Proc SQL;
  Create Table Mispar_Girsat_Pol As
    Select
      a.*
      ,b.Ms_Girsa
      ,b.version
      ,b.Tr_Kovetz
    From Data_Prep As a
      Left Join WORK.Versions As b
        On a.l_pol_no = b.Ms_pol
      Where 
        Datepart(Tr_Kovetz) > "01jan2023"d 
      Order by
        a.l_pol_no,
        b.Ms_Girsa;
Quit;

 For performance: If ms_pol needs to come first or second in the query depends on the indexes on your Oracle table. Given what you described I'm hoping your DBA partitioned the table by date (tr_kovetz) and that's why I have this variable first in the where clause.

 

Patrick_0-1695282725455.png

 

LinusH
Tourmaline | Level 20

I think it's better that you add more information as a reply in the thread, the it's easier to follow the discussion from top to bottom.

 

Have you tried to create a temporay table? That might work without the intervension af an DBA. If not, DBA shoudn't have a lot of concerns letting you do that.

SAS Help Center: Temporary Table Support for SAS/ACCESS

 

Another options is to try the DBKEY datset option:

SAS Help Center: DBKEY= Data Set Option

Not sure if it works with outer joins, but agian, since you have a relvitve small result set, you should be able to merge the unmatched records from your SAS work table quickly.

 

Data never sleeps

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 2325 views
  • 2 likes
  • 5 in conversation