Help using Base SAS procedures

Need answers if you can

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

Need answers if you can

1)Large data set (20 mil records) and a second data set (10,000 records). Say we do an inner join with product ID. But this is too slow.

      Any recommendation to improve performance?

2)I get the output data set from a current dataset A, product ID by the last occurrence, check if its excluded in a separate dataset B (say, discontinued products).

     How do you do it?


Accepted Solutions
Solution
‎11-14-2013 09:41 AM
Frequent Contributor
Posts: 114

Re: Need answers if you can

Hello Sahaji,

I will give it my best shot ;-)

1)

In my opinion your questions depend on the infrastructure/hardware specs that you have.

There are several options for you to try:

- you could use the smaller data set as a hash file: http://support.sas.com/rnd/base/datastep/dot/better-hashing-sas92.pdf

- you could create a format from the smaller data set and use it on the 20mio data set. Then when your format variable is not filled you would drop that row.

- if you use SPDS for your data sets you can safe a lot of time by using PROC SQL instead of DATA-Steps and make use of parallel processing (SAS(R) Scalable Performance Data Server(R) 4.53: User's Guide). We used PROC SQL instead of DATA-Steps to save a lot of time in our programs.

- you could also try to create indexes on your join variables

2)

Perhaps you could specify some details. Why would you want to check if your product ID is not within a table B?

Do you want to make sure that it is not in table B? Then I would suggest a select where the product Id is not in select product ID from table A.

But with more details we might find a better fitting solution.

Cheers,

Michael

View solution in original post


All Replies
Solution
‎11-14-2013 09:41 AM
Frequent Contributor
Posts: 114

Re: Need answers if you can

Hello Sahaji,

I will give it my best shot ;-)

1)

In my opinion your questions depend on the infrastructure/hardware specs that you have.

There are several options for you to try:

- you could use the smaller data set as a hash file: http://support.sas.com/rnd/base/datastep/dot/better-hashing-sas92.pdf

- you could create a format from the smaller data set and use it on the 20mio data set. Then when your format variable is not filled you would drop that row.

- if you use SPDS for your data sets you can safe a lot of time by using PROC SQL instead of DATA-Steps and make use of parallel processing (SAS(R) Scalable Performance Data Server(R) 4.53: User's Guide). We used PROC SQL instead of DATA-Steps to save a lot of time in our programs.

- you could also try to create indexes on your join variables

2)

Perhaps you could specify some details. Why would you want to check if your product ID is not within a table B?

Do you want to make sure that it is not in table B? Then I would suggest a select where the product Id is not in select product ID from table A.

But with more details we might find a better fitting solution.

Cheers,

Michael

Occasional Contributor
Posts: 15

Re: Need answers if you can

Thank You so much Michael. Thanks for spending time to answer my question. Greatly appreciated.

Occasional Contributor
Posts: 6

Re: Need answers if you can

Is the large dataset indexed or partitioned?    If it is, then you may be able to limit the data very early in the process.
Add a where clause that uses a variable that is indexed or partitioned.     You can add the where clause right on the

join, as in the example below.

Good luck.

Sheri

 

PROC SQL;

CREATE TABLE  Lab_Results as

select LabGroup, Prov_ID,  A.Enc_ID, PATIENT_ID, A.Order_ID, Order_Date, Order_End_Date,

           L.Lab_Status_C,   ORDER_Value,

           L.Result_Date, L.Result_Time, Result_In_Range_YN, L.Result_Status_C

FROM ProdFil.EntityLabs A

JOIN  ProdFil.Order_Results (where=(Result_Date >= '01JAN2013'd)  L

on A.Enc_ID = L.Enc_ID

where A.Order_Proc_ID = '132856'

quit;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 276 views
  • 0 likes
  • 3 in conversation