Re: SAS DI:SCD Type 2 Insufficient space in work

dubey_vikas1984 · Posted 05-12-2017 04:06 AM

Hi,

My SAS DI job is based on DB2 processing.i am using GLOBAL connection with SESSION schema and pointing temrory work location to DB2.but at the point when i am using SCD Type 2/Table Loader it is trying to create sas work table and giving error Insufficient space in work.

Is there any way to redirect SCD Type 2/Table Loader work table to DB2 OR is there any option i can use for loading huge amount of data in dimension table

Please suggest!!!

Kurt_Bremser · Posted 05-12-2017 06:15 AM

WORK is a special location that SAS uses for more than just temporary datasets; don't confuse it with a temporary space in Oracle.

That said, if you run out of space there, you should first look if there's unused data that needs to be deleted. This includes left-over directories from earlier SAS sessions that did not terminate gracefully.

And you should size the WORK location according to your needs. You need at least three times the size of the largest table (as stored by SAS) that you will work with.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

LinusH · Posted 05-12-2017 04:40 PM

May this be applicable to you?
http://support.sas.com/kb/57/996.html

Data never sleeps

Patrick · Posted 05-12-2017 08:32 PM

Hi @dubey_vikas1984

Which approach to take will depend on what you have.

- Where is your source table stored: SAS or DB2?

- Does your source table contain deltas (just changes) or is it a full daily snapshot?

If deltas:

Could you have more than one row per primary key (like journal data) in your source table?

What's the relationship in volumes between source and target (like 1M rows in source and 1000M rows current records in target).

- Do you also have to expire ("delete") records without adding a new current record? And if yes: Do you get a separate deletion source for this or do you have to work it out?

How I understand things:

The DIS SCD2 loader has initially been implemented and tested to work primarily with SAS tables. There have been some modifications added in later releases which allow for better control of what gets pushed to a database for processing (i.e. permanent XREF table in database) but there are still steps which will only work on the SAS side and which require SAS processing (i.e. md5() to create digest values).

If both your source and target are already in the same database then this will cause the source data to get loaded back and forth between the database and SAS (which is not a good thing).

Depending on how you define things in the SCD2 transformation it can happen that even the target table gets loaded into SAS. That you run out of SAS Work space indicates that this is currently happening for you.

Even though things would still work logically such data transfers have a huge performance impact and you really need to avoid it.

I had in the past cases where I simply couldn't tweak the SCD2 transformation in a way to avoid too much of such data transfers between SAS and the database and I've ended up to not use the SCD2 transformation but to implement this load with a series of SQL transformations.

If you want to go down this path and you work for a bigger company then it's eventually also worth to have a chat with a DB2 DBA. May be some other project already implemented such an SCD2 load in the DB and you could use the same approach (using SAS just to call existing processes or the like). I don't know what's available and possible in DB2 but other databases allow you to implement such load logic as a view in the database and you then simply load your data from SAS into this view.

Happy to throw a few more ideas once you've answered my questions.

Thanks,

Patrick

dubey_vikas1984 · Posted 05-15-2017 06:18 AM

Thanks for prompt response!!

- Where is your source table stored: SAS or DB2?

My source and target stored in same DB2

- Does your source table contain deltas (just changes) or is it a full daily snapshot?

Yes, it is contain deltas change.

If deltas:

Could you have more than one row per primary key (like journal data) in your source table?

No,

What's the relationship in volumes between source and target (like 1M rows in source and 1000M rows current records in target).

50M

- Do you also have to expire ("delete") records without adding a new current record? And if yes: Do you get a separate deletion source for this or do you have to work it out?

We do not have

My source and target is same DB2.most of the cases i am using Passthrough for DB processing.Could you please guide if there is way and i can redirect SCD type2/Table loader work table in DB only.

Patrick · Posted 05-15-2017 06:46 PM

Hi @dubey_vikas1984

The DIS SCD Type 2 transformation doesn't allow you to push 100% of processing to the database 😞

At minimum the step creating the new digest values for your delta table (used in the Xref table) uses SAS function MD5() which must execute on the SAS server side.

There is somewhere an option in the SCD2 transformation where you can choose that the XREF table for the target table gets stored permanently on the server side - and that's something you must do as else also the target XREF table gets re-calculated every single time which requires the target table to get loaded into SAS for processing.

If this is about performance and the source and the target table are both already stored in the same database then I wouldn't use the DIS SCD Type 2 transformation at all but write the load code myself to work fully in-database.

If you need such a loader more than once then create a custom transformation and "macrotize" the code; use prompts to pass in the column names for variables and so on.

First thing I'd do:

Make my DB2 SQL code work without any SAS involvement using a DB2 client (something similar to SQL Developer for Oracle; not sure what that is for DB2).

Here the code logic to implement:

http://www.kimballgroup.com/2008/11/design-tip-107-using-the-sql-merge-statement-for-slowly-changing...

If you also create a surrogate or retained key then implement and use a sequencer to create key values for new rows:

https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/sqlref/src/tpc/db2z_sql_createsequence....

Once you've got the code working directly on the database I'd copy/paste this code as a whole into a SAS EG or SAS Studio code window, wrap a Proc SQL around with a single EXECUTE block and once that works wrap a macro around the whole Proc SQL and start to make things dynamic (=macro variables for changing stuff like database column names).

Once you've got a working SAS macro create a DIS custom transformation, copy/paste your code there (or have the macro as such in a SAS Autocall library and only add the call to the macro to the custom transformation) and add the prompts to pass in the values to your macro.

I know, sounds like a lot of work, but if your SAS macro coding skills are up-to-scratch then it's actually not that hard to implement.

Hope that helps,

Patrick

SAS DI:SCD Type 2 Insufficient space in work