Hi All,
i have huge oracle table ,in that one column's value is getting populated as missing and I have been asked to find out how many are getting populated as missing (count) and for which group var. So I am writing following query but its taking a lot of time.(running for more than 3 hrs)
LIBNAME gdw ORACLE PATH=P1 SCHEMA=W USER=R
PASSWORD="XXXXXX"
;
libname ip '/gpfs3' ;
%Macro m1 (period);
proc sql outobs=1000;
create table ip._&period. as
select count(status) as cnt,
period_id from gdw.ACCT_HIST;
where status is missing and period_id="&period"
group by period_id;
quit;
%mend;
%m1(201307)
%m1(201306)
%m1(201305)
%m1(201304)
%m1(201303)
%m1(201302)
Please suggest any alternatives to get faster results.
First, it seem you have a ; too much, in the present condition your where and group by doesn't execute.
As for optimization, you need to know a little how this table is set up in Oracle.
Is the table partitioned by period_id? If not, your total run time would probably benefit if executed in one pass (not looping through the different periods).
A part from that, the query is quite straight forward, and if SAS pushes the query to Oracle, then you need to talk to the Oracle DBA about Oracle performance.
To see what is really going "down" specify:
options sastrace=',,,d' sastraceloc=saslog;
Is there a reason you need to output a separate table for each period_id? Especially since each table will consist solely of the period_id and the count of missing obs? Why not create one table with a "group by" based on period_id?
The code is error free. The question is about performance.
By happy, the machine is still not smart enough to transform the unstructerd question in answer on his own.
You have something to do and think about it.
Some advices:
- Break up your research into smaller parts.
Eg extract the missings to a smaller table to do more research on. The pre-req is this number and size must fit in your environment.
The smaller the dataset to work on the easier it becomes. (Same as fugues)
- Understand the physical effects of your request
Indeed Check with you ORA DBA (as LinusH) already stated. If there are views (disadvantage), indexes(advantage) you could imagine what the DBMS internally wil do.
- Sequential/Random Normally invisible as hidden behind the SQL.
When the effect wil be an access to the whole database touching every record. Try to have it done just once and sequential.
When the effect will be accessing more than (rule of thumb) as 10% still do it sequential once, is it less random (indexed) would be faster.
Getting some answers, iterate this as new questions will need to solved.
Fot the tecnical part on SAS/Access to oracle there are a lot of tuning options.
For simple read retrieve is should commonly be acceptable. Teh buffsize can help to improve, some functions are causing ... (heavy network load)
http://support.sas.com/resources/papers/proceedings13/081-2013.pdf
Just as a genereal question, are you running this on a local sas machine that is not connected to a server? Or are you running it on a server that is in a different storage location from your Oracle server?
If so, it could be your network that is causing a lot of the slowdown. I know because I was running reports on a local machine that was accessing a server half way across the country, and a query that would take ~30 seconds on the server was taking over 45 minutes to go through the network (especially when you're returning a lot of data).
Just another thing to check out
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.