Hi everyone,
Previous, when using SAS, we can have pass through to run the query in Oracle, this process is quite smooth.
However, due to a special nature of safe environment, I have to run all the codes in SAS only. It runs incredibly slow now, even for a small set for data b.
Here is the code, and I will put some labels.
I did some searches, but frankly not sure which will be the best approach like index( in this case will > than 34% unique labels) /hash?).
I tried:
1) data step to extract all related data, then join; take too long and too much spaces;
2) use subquery, not by join. use filter as ---- label in (select distinct label in att.list_low5_&c_year), run quite slow as well.
Any recommendation from advanced users? thank you very much.
proc sql;
create table att.total_list_&p_year as
select distinct a.label, num, datepart(dt) as dt
from hynp.flat a                                              (*********************  data in Oracle, not sortable, 80 millions records each year)
inner join att.list_low5_&c_year b                     (******************* records from thousands to up to 4millions,  this macro run ten circles)
on a.label = b.label
where
dt >= &s_date and
dt <='31Mar2019'd
and prov = ..
and tp = ..
and pec = ..
and pnum ^= ..
and cd < ..
and vcd not in (.....)
and vloc not in (....)
and tm not between ... and ...
and tm not in (...)
and num not in (...)
order by a.label,dt desc, num;
quit;
Then you should use a hash for the inner join:
data att.total_list_&p_year;
set hynp.flat;
where /* your where condition */;
if _n_ = 1
then do;
  declare hash l (dataset:"att.list_low5_&c_year (keep=label)");
  l.definekey("label");
  l.definedone();
end;
if l.check() = 0;
run;From which table originate all the variables without an alias?
Does &s_date change frequently?
Do you have any kind of dynamics in the other parts of the WHERE?
1, hynp.flat is the source table; so label, dt, num are all from the source table.
att.list_low5_&c_year b is a label cohort generated from hynp.flat;
2, &s_date does not change and is a certain date for each macro;
3, No, these filters are set, there is no dynamic in WHERE;
Then you should use a hash for the inner join:
data att.total_list_&p_year;
set hynp.flat;
where /* your where condition */;
if _n_ = 1
then do;
  declare hash l (dataset:"att.list_low5_&c_year (keep=label)");
  l.definekey("label");
  l.definedone();
end;
if l.check() = 0;
run;thanks. In the process of testing code now, I am dealing an error message. Will get back to you.
@pensarchem wrote:
thanks. In the process of testing code now, I am dealing an error message. Will get back to you.
Two thoughts:
However, due to a special nature of safe environment, I have to run all the codes in SAS only
Can you explain this a bit further? You're already connecting to Oracle so why exactly can't you push some of the processing to Oracle?
If you have to pull all the data first into SAS before reducing volumes then that's already going to impact quite a bit on performance even if your 2nd table would only contain a few rows.
Yes, currently I run the code on a lower security level, so I can still connect to oracle.
However, we do have a special safe environment and all the data were imported manually from various resources. The data will be imported through .gz files into sas and run locally only under SAS..
So I am testing my code now and will move to safe environment after everything runs smoothly..I hope this clarifies.
There must be something preventing SAS/SQL from asking Oracle to preprocess your query. I suspect it is the use of function datepart() .
Try adding SQL_FUNCTIONS=ALL to your libname definition of hynp. This should allow the datetime function to be passed to Oracle for processing.
If that doesn't work, write datepart(a.dt) as dt in the select clause and use datetime constants in your where clause (e.g. .... a.dt <='31Mar2019:00:00:00'dt).
thanks for your input, I answered this above. We normally do run the code in Oracle and the process is quite quick..
And WHY exactly can't you run it in Oracle this time? If you have first to transfer all the data into SAS before subsetting then no wonder it takes much longer. At least try and use the proposed hash table approach so the big data set doesn't need sorting as well on the SAS side before reducing the data volumes.
Please add this line
options sastrace=',,,db' sastraceloc=saslog nostsuffix;and show us the full log.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
