BookmarkSubscribeRSS Feed
Ravikumar_RT
Fluorite | Level 6

Hi All,

I am using following code for the requirment given below, Present it is taking 5 hours to complete the following query because dataset size are huge.


Can somebody help me with efficient SAS code which can reduce processing time.

     
Proc sql;

    create table INVENTORY as select distinct
       a.*
    ,min(case when (b.PROCESS_DATE GT a.CREATION_DATE and c.MAKING_DATE GE b.PROCESS_DATE and c.MAKING_DATE LE b.PROCESS_DATE_5DAYS and c.AMOUNT LE 0)
              then b.PROCESS_DATE else . end) as CLEARENCE_DATE format=date9.
    ,min(case when (b.PROCESS_DATE GT a.CREATION_DATE and c.MAKING_DATE GE b.PROCESS_DATE and c.MAKING_DATE LE b.PROCESS_DATE_5DAYS and c.AMOUNT LE 0)
              then c.MAKING_DATE  else . end) as INV_DATE format=date9.
    from MAIN_DATA as a
    left join PROCESSING_DATA as b    <------(Has Record Count of "682152529")
    on a.ID = b.ID
    left join MAKING_DATA as c    <------(Has Record Count of "374098947")
    on a.ID = c.ID  
    group by a.ID, a.PART_ID
    order by a.ID, a.PART_ID;

Quit;

I have one main base dataset "MAIN_DATA" and I am joining my "MAIN_DATA" with two huge datasets called "PROCESSING_DATA" and "MAKING_DATA"
1. MAIN_DATA (This dataset has "CREATION_DATE")
2. PROCESSING_DATA (This dataset has "PROCESS_DATE" and "PROCESS_DATE_5Days" )
3. MAKING_DATA (This dataset has "MAKING_DATE" and "AMOUNT")

1. Requirement for Identifying "Clearence Date":
I want to know earliest "PROCESS_DATE" date for the following:
"PROCESS_DATE" should be After "CREATION_DATE" and "MAKING_DATE" should be between "PROCESS_DATE" and "PROCESS_DATE_5Days" and "AMOUNT" should be ZERO(0) or less than ZERO(0)

2. Requirement for Identifying "Inventory Date":
I want to know earliest "MAKING_DATE" date for the following:
"PROCESS_DATE" should be After "CREATION_DATE" and "MAKING_DATE" should be between "PROCESS_DATE" and "PROCESS_DATE_5Days" and "AMOUNT" should be ZERO(0) or less than ZERO(0)

 

Thanks,

Ravi

3 REPLIES 3
Kurt_Bremser
Super User

Using select distinct on a long list of variables is usually a VERY BAD IDEA, as it forces SQL to sort by ALL variables to remove doubles.

Your first task shall therefore be to determine if that distinct is necessary, and how to work around it if it is (eg if only a subgroup of variables contains the necessary values to determine the distinct, you can do a sort nodupkey using that).

Next you need to inspect your relationships. If you have a 1:n relationship, you can replace that join with a sort and data step merge, which usually outperforms SQL up to orders of magnitude.

s_lassen
Meteorite | Level 14

How large is your MAIN_DATA table? If it has significantly fewer rows than the other two tables, you may get better performance by putting indexes on the large tables:

proc sql;
  create index ID on PROCESSING_DATA(ID);
  create index ID on MAKING_DATA(ID);
quit;

Of course, the feasibility and efficiency of this also depends on how often (and how) you update the two large tables. But often the creation of an index takes less time than what you can save in a single query.

Ravikumar_RT
Fluorite | Level 6
Hi,

Thanks for your suggestion, I did tried processing by creating index, but still it took more than 5 hours to complete.

MAIN_DATA - "1249382" (Total Record count)
PROCESSING_DATA - "682152529" (Total Record count)
MAKING_DATA - "374098947" (Total Record count)

Please could you help me with any alternative SAS code which helps in reducing processing time.

My requirement is same:

I have one main base dataset "MAIN_DATA" and I am joining my "MAIN_DATA" with two huge datasets called "PROCESSING_DATA" and "MAKING_DATA"
1. MAIN_DATA (This dataset has "CREATION_DATE")
2. PROCESSING_DATA (This dataset has "PROCESS_DATE" and "PROCESS_DATE_5Days" )
3. MAKING_DATA (This dataset has "MAKING_DATE" and "AMOUNT")

1. Requirement for Identifying "Clearence Date":
I want to know earliest "PROCESS_DATE" date for the following:
"PROCESS_DATE" should be After "CREATION_DATE" and "MAKING_DATE" should be between "PROCESS_DATE" and "PROCESS_DATE_5Days" and "AMOUNT" should be ZERO(0) or less than ZERO(0)

2. Requirement for Identifying "Inventory Date":
I want to know earliest "MAKING_DATE" date for the following:
"PROCESS_DATE" should be After "CREATION_DATE" and "MAKING_DATE" should be between "PROCESS_DATE" and "PROCESS_DATE_5Days" and "AMOUNT" should be ZERO(0) or less than ZERO(0)

Once again thanks alot for your help.

Regards,
Ravi

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 2527 views
  • 2 likes
  • 3 in conversation