BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pensarchem
Obsidian | Level 7

Hi everyone,

Previous, when using SAS,  we can have pass through to run the query in Oracle, this process is quite smooth.

However, due to a special nature of safe environment, I have to run all the codes in SAS only.   It runs incredibly slow now, even for a small set for data b. 

 

Here is the code, and I will put some labels. 

I did some searches, but frankly not sure which will be the best approach like index( in this case will > than 34% unique labels) /hash?).  

     I tried:

    1) data step to extract all related data, then join; take too long and too much spaces;

    2) use subquery, not by join.   use filter as ---- label in (select distinct label in att.list_low5_&c_year), run quite slow as well.

 

Any recommendation from advanced users?  thank you very much. 

 

 

 

proc sql;
create table att.total_list_&p_year as
select distinct a.label, num, datepart(dt) as dt
from hynp.flat a                                              (*********************  data in Oracle, not sortable, 80 millions records each year)
inner join att.list_low5_&c_year b                     (******************* records from thousands to up to 4millions,  this macro run ten circles)
on a.label = b.label
where
dt >= &s_date and
dt <='31Mar2019'd
and prov = ..
and tp = ..
and pec = ..
and pnum ^= ..
and cd < ..
and vcd not in (.....)
and vloc not in (....)
and tm not between ... and ...
and tm not in (...)

and num not in (...)

order by a.label,dt desc, num;
quit;

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Then you should use a hash for the inner join:

data att.total_list_&p_year;
set hynp.flat;
where /* your where condition */;
if _n_ = 1
then do;
  declare hash l (dataset:"att.list_low5_&c_year (keep=label)");
  l.definekey("label");
  l.definedone();
end;
if l.check() = 0;
run;

View solution in original post

11 REPLIES 11
Kurt_Bremser
Super User

From which table originate all the variables without an alias?

Does &s_date change frequently?

Do you have any kind of dynamics in the other parts of the WHERE?

pensarchem
Obsidian | Level 7

1, hynp.flat  is the source table;  so label, dt, num are all from the source table.  

att.list_low5_&c_year b   is a label cohort generated from hynp.flat;

 

2, &s_date does not change and is a certain date for each macro;

3, No, these filters are set, there is no dynamic in WHERE;

Kurt_Bremser
Super User

Then you should use a hash for the inner join:

data att.total_list_&p_year;
set hynp.flat;
where /* your where condition */;
if _n_ = 1
then do;
  declare hash l (dataset:"att.list_low5_&c_year (keep=label)");
  l.definekey("label");
  l.definedone();
end;
if l.check() = 0;
run;
pensarchem
Obsidian | Level 7

thanks.  In the process of testing code now, I am dealing an error message.  Will get back to you.

Kurt_Bremser
Super User

@pensarchem wrote:

thanks.  In the process of testing code now, I am dealing an error message.  Will get back to you.


Two thoughts:

  • The argument about the datepart() function is valid. Using a SAS-only function will force SAS to unload data before applying the function. OTOH, using it in a data step you simply unload the datetime value as is, which should not pose a performance problem. Just make sure that you do not use SAS-specific elements in the WHERE condition, as SAS will send that to the DBMS for execution there, if it can.
  • Whenever you use a connection to a DBMS, SAS will implicitly run SQL code there. Securitywise, there is no difference to using explicit pass-through: code will be sent from SAS to the DBMS. If the "safe environment" is not hardened against any mishap induced from outside (through proper permissions, performance limits, etc), then it is not safe at all. If there are any doubts, you can only have code (that was checked by the DBA) run in the DBMS to unload to a flat file, and have that sent to SAS. This is, BTW, the method we use.
Patrick
Opal | Level 21

However, due to a special nature of safe environment, I have to run all the codes in SAS only

Can you explain this a bit further? You're already connecting to Oracle so why exactly can't you push some of the processing to Oracle?

If you have to pull all the data first into SAS before reducing volumes then that's already going to impact quite a bit on performance even if your 2nd table would only contain a few rows. 

 

 

pensarchem
Obsidian | Level 7

Yes, currently I run the code on a lower security level, so I can still connect to oracle.

However, we do have a special safe environment and all the data were imported manually from various resources.  The data will be imported through .gz files into sas and run locally only under SAS..

 

So I am testing my code now and will move to safe environment after everything runs smoothly..I hope this clarifies.

PGStats
Opal | Level 21

There must be something preventing SAS/SQL from asking Oracle to preprocess your query. I suspect it is the use of function datepart() .

 

Try adding SQL_FUNCTIONS=ALL to your libname definition of hynp. This should allow the datetime function to be passed to Oracle for processing.

 

If that doesn't work, write datepart(a.dt) as dt in the select clause and use datetime constants in your where clause (e.g. .... a.dt <='31Mar2019:00:00:00'dt).

PG
pensarchem
Obsidian | Level 7

thanks for your input, I answered this above.  We normally do run the code in Oracle and the process is quite quick..

Patrick
Opal | Level 21

And WHY exactly can't you run it in Oracle this time? If you have first to transfer all the data into SAS before subsetting then no wonder it takes much longer. At least try and use the proposed hash table approach so the big data set doesn't need sorting as well on the SAS side before reducing the data volumes. 

ChrisNZ
Tourmaline | Level 20

Please add this line

options sastrace=',,,db' sastraceloc=saslog nostsuffix;

and show us the full log.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2813 views
  • 0 likes
  • 5 in conversation