Hi All,
I am using following code for the requirment given below, Present it is taking 5 hours to complete the following query because dataset size are huge.
Can somebody help me with efficient SAS code which can reduce processing time.
Proc sql;
create table INVENTORY as select distinct
a.*
,min(case when (b.PROCESS_DATE GT a.CREATION_DATE and c.MAKING_DATE GE b.PROCESS_DATE and c.MAKING_DATE LE b.PROCESS_DATE_5DAYS and c.AMOUNT LE 0)
then b.PROCESS_DATE else . end) as CLEARENCE_DATE format=date9.
,min(case when (b.PROCESS_DATE GT a.CREATION_DATE and c.MAKING_DATE GE b.PROCESS_DATE and c.MAKING_DATE LE b.PROCESS_DATE_5DAYS and c.AMOUNT LE 0)
then c.MAKING_DATE else . end) as INV_DATE format=date9.
from MAIN_DATA as a
left join PROCESSING_DATA as b <------(Has Record Count of "682152529")
on a.ID = b.ID
left join MAKING_DATA as c <------(Has Record Count of "374098947")
on a.ID = c.ID
group by a.ID, a.PART_ID
order by a.ID, a.PART_ID;
Quit;
I have one main base dataset "MAIN_DATA" and I am joining my "MAIN_DATA" with two huge datasets called "PROCESSING_DATA" and "MAKING_DATA"
1. MAIN_DATA (This dataset has "CREATION_DATE")
2. PROCESSING_DATA (This dataset has "PROCESS_DATE" and "PROCESS_DATE_5Days" )
3. MAKING_DATA (This dataset has "MAKING_DATE" and "AMOUNT")
1. Requirement for Identifying "Clearence Date":
I want to know earliest "PROCESS_DATE" date for the following:
"PROCESS_DATE" should be After "CREATION_DATE" and "MAKING_DATE" should be between "PROCESS_DATE" and "PROCESS_DATE_5Days" and "AMOUNT" should be ZERO(0) or less than ZERO(0)
2. Requirement for Identifying "Inventory Date":
I want to know earliest "MAKING_DATE" date for the following:
"PROCESS_DATE" should be After "CREATION_DATE" and "MAKING_DATE" should be between "PROCESS_DATE" and "PROCESS_DATE_5Days" and "AMOUNT" should be ZERO(0) or less than ZERO(0)
Thanks,
Ravi
Using select distinct on a long list of variables is usually a VERY BAD IDEA, as it forces SQL to sort by ALL variables to remove doubles.
Your first task shall therefore be to determine if that distinct is necessary, and how to work around it if it is (eg if only a subgroup of variables contains the necessary values to determine the distinct, you can do a sort nodupkey using that).
Next you need to inspect your relationships. If you have a 1:n relationship, you can replace that join with a sort and data step merge, which usually outperforms SQL up to orders of magnitude.
How large is your MAIN_DATA table? If it has significantly fewer rows than the other two tables, you may get better performance by putting indexes on the large tables:
proc sql; create index ID on PROCESSING_DATA(ID); create index ID on MAKING_DATA(ID); quit;
Of course, the feasibility and efficiency of this also depends on how often (and how) you update the two large tables. But often the creation of an index takes less time than what you can save in a single query.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.