Hi ,
Need your help on below sas mainframe query..
The query is taking a very long time to run(its running since 2 days and has not finished yet)..
CREATE VIEW OUTTAB AS
SELECT FILEA.ALLF1 ,FILEB.ALLF2
FROM FILEB ,
FILEA WHERE
(FILEA.POL=FILEB.POL) OR
(FILEA.LNAME=FILEB.LNAME AND
FILEA.FNAME=FILEB.FNAME AND
FILEA.DOB=FILEB.DOB ) ;
FileA has 2233535 RECORDS
FileB has 155960 RECORDS
When there is only one condition IN WHERE CLAUSE the query runs very quickly.
I feel If there is one condition it would be treated as inner join so that might be the reason.
Basically I want to select records from filea and fileb
where FILEA.POL=FILEB.POL
or
when all the below conditions meet..
( FILEA.LNAME=FILEB.LNAME AND
FILEA.FNAME=FILEB.FNAME AND
FILEA.DOB=FILEB.DOB )
Can u suggest a improvement in query please.
Thanks
Vishal
Well, SAS SQL has some flaws...
What you can do is to split this query into two queries, and then merge the result sets (just append/insert if they are exclusive, or together with some distinct or NODUPKEY operation if aren't).
Generally, it's hard to optimize OR operators. You cold try to store the data in SPDE, since SPDE sometimes can optimize queries that contains OR operators.
hi,
I modified the query as below..
I am running this query on Mainframe system.
PROC SQL;
CREATE VIEW SASLIB.OUTTAB AS
SELECT FILEA.ALLF1,FILEB.ALLF2
FROM SASLIB.FILEB
INNER JOIN SASLIB.FILEA ON ( FILEA.POL=FILEB.POL ) OR
(FILEA.LNAME=FILEB.LNAME AND
FILEA.FNAME=FILEB.FNAME AND
FILEA.DOB=FILEB.DOB);
RUN; QUIT;
I got the below error code..
ERROR: UTILITY FILE WRITE FAILED. PROBABLE DISK FULL CONDITION.
I tried increasing the workspace also tried with allocating a library but nothing seems to work
File1 has 2233535 RECORDS
File2 has 155960 RECORDS
Thanks
Vishal
Can we further improve this query please..
The seconds query does not differs from the first one, using where is practical the same as using inner-on.
Refer to my previous post for other techniques.
I also recommend to go with seperate quiries then joining the results. In proc sql with OR operation takes more time.
Your query is logically equivalent to :
PROC SQL;
CREATE VIEW SASLIB.OUTTAB AS
(SELECT FILEA.ALLF1, FILEB.ALLF2
FROM SASLIB.FILEB INNER JOIN SASLIB.FILEA ON
FILEA.POL=FILEB.POL )
UNION
(SELECT FILEA.ALLF1, FILEB.ALLF2
FROM SASLIB.FILEB INNER JOIN SASLIB.FILEA ON
FILEA.LNAME=FILEB.LNAME AND
FILEA.FNAME=FILEB.FNAME AND
FILEA.DOB=FILEB.DOB);
QUIT;
which might work better since it avoids the dreaded OR operator. The UNION operation includes unique records only once, even if they appear in both sets. Worth a try.
PG
CASE WHEN is faster than OR so try below given code :
CREATE VIEW OUTTAB AS
SELECT FILEA.ALLF1 ,FILEB.ALLF2
FROM FILEB ,
FILEA WHERE
( case when FILEA.POL ne FILEB.POL then
( FILEA.LNAME=FILEB.LNAME AND
FILEA.FNAME=FILEB.FNAME AND
FILEA.DOB=FILEB.DOB
)
else FILEA.POL = FILEB.POL
end
) ;
That is the disadvantage of SQL ,and sure it is the advantage of SAS.
For this large table , your best choice is to use Hash Table. An alternative way is to change the small table into a
format by using proc format ,then to perform query. I have some experience like yours. I used proc format and spended about one hour to get the reslut. But I think Hash Table will be faster.
Ksharp
Hello,
Thank u all for the help..
The select-case statement seems to be a better option...
I was able to try with only a few records(with 1lakh in both the files)..
But When i run the query for the full files(with all records
File1 has 2233535 RECORDS
File2 has 155960 RECORDS )
Then I get into a problem,
Below is the error I get.
ERROR: UTILITY FILE WRITE FAILED. PROBABLE DISK FULL CONDITION.
I am running this on a mainframe system.
I tried with allocating a seperate sas library, increasing work space, Compressing the dataset etc but nothing has worked.
Just want to know is there a way to allocate more than one library and probably
concatenate all of them together in SAS.
The reason being I can'nt get one big sas library allocated in Mainframe.
Instead i can get chunks of small libraries allocated.
So can we implement the below example..
Example:)
Libname l1 "mainframe path" ;
Libname l2 "mainframe path" ;
Libname l3 "mainframe path" ;
Then can we have the below scenario implemented in sas
l4 = l1 + l2 + l3 ;
Once we have l4 we can save the sas dataset in l4 as below..
create l4.outab as
(my sql query)
Can someone suggest if this is possible.
Thanks
Vishal
You can concatenate several libraries in a libname statement.
Libname l4 ('c:\temp\' 'd:\ftp\') ;
But I think your problem is not there. Hash Table also can do the same thing as CASE END did and better.
For the large table, the only choice is Hash Table or proc format
Ksharp
Are all the columns you are using in joins indexed? I am not sure if the SAS query optimiser is clever enough to create temporary indexes for the query. Joining these tables without an index will be very slow, particularly if the rows are very wide (lots of big columns). You can find out how the query is working with the PROC SQL _METHOD option. See http://www2.sas.com/proceedings/sugi30/101-30.pdf for more information about this option.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.