Re: SAS DI Job : Join on large dataset - Page 2

ChrisNZ · Posted 05-07-2019 02:29 AM

1. You don't need an index if the tables are sorted.
2. Try creating the table in the SPDE library. Not in AAA.

High-Performance SAS Coding - Third Edition

JJP1 · Posted 05-07-2019 03:18 AM

Hi @ChrisNZ,

Would you please help how can i create dataset in SPDE library as i am new to SPDE library please.

sorry to trouble you with simple queries.

i was assuming that if i run below statement then AAA library will be converted as SPDE library.please guide me

libname SPEEDY spde "%sysfunc(pathname(AAA))" compress=binary partsize=500g;

ChrisNZ · Posted 05-07-2019 03:49 AM

Runnng the libnane statement creates the library.
So then you have 2 libraries. No conversion.
Just use it like you use the AAA library.

High-Performance SAS Coding - Third Edition

ScottBass · Posted 05-07-2019 04:20 AM

@ChrisNZ wrote:
1. You don't need an index if the tables are sorted.
2. Try creating the table in the SPDE library. Not in AAA.

To clarify, you need an index if you are using index key lookup (my previous code). You may not need the index if you're using proc sql and your data is sorted.

You will benefit from reading this: http://support.sas.com/techsup/technote/ts553.html.

Use the proc sql _method operand to see if proc sql is sorting your data before joining. If proc sql is sorting, that will be a huge performance and utility space hit. I have this personal saying "sorting is evil". Often you need it, but avoid it when you can (for example sorting your large data overnight). If at all possible, you want to coax proc sql NOT TO SORT YOUR TABLES.

See also:

http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000766829.htm

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000131184.htm

https://support.sas.com/resources/papers/proceedings/proceedings/sugi31/131-31.pdf (search on sortedby)

https://support.sas.com/resources/papers/proceedings/proceedings/sugi29/127-29.pdf (search on sortedby)

If you use the index key lookup technique, you have more control over the code execution, and are guaranteed not to be sorting your data. But you need to create the index.

To convert your data to spde:

proc copy in=your_current_library out=your_spde_library;
   select your_desired_tables;
run;

Or you can run a data step(s) to get the same results.

Of course, your indexes need to be in the same library location as your dataset.

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

ChrisNZ · Posted 05-07-2019 06:47 PM

@ScottBass I mention that indexes are not needed if tables are sorted as @JJP1 mentionned "and finally used merge statement to run both permanent tables based on the merge key"

Other options are possible of course.

@JJP1 Some methods are compared here.

https://communities.sas.com/t5/SAS-Communities-Library/Study-on-the-best-method-to-join-two-tables/t...

High-Performance SAS Coding - Third Edition

Patrick · Posted 05-03-2019 10:12 PM

@JJP1

Just to add what other's wrote:

IF it's your ETL which also creates the big table (103 GB) then a simple measure would be to create an index on this table there (just add the index for ID to the table metadata definition and SAS will create the index (if using the Table Loader transformation then you will also need select when to create/maintain the index in this transformation).

IF the physical table already exists and your ETL is not always re-creating it then also copy/past the DIS generated code creating the index and execute this code as a one-off job.

There are also other techniques available to improve lookup performance but what will work depends on the details of your data and will require user written code.

Try to avoid DISTINCT as this requires another pass with sorting through your data. There are likely other ways to get to the result you want but you would need to provide more detail about your data and the relationships between the tables to propose something concrete.

JJP1 · Posted 05-16-2019 07:02 AM

Hi @Patrick,

Sorry for coming back late on this.actually would you please suggest other option other than indexing please(as already these tables are running daily with out indexes please.)

ErikLund_Jensen · Posted 05-16-2019 08:16 AM

Hi @JJP1

If you use a Table Loader transformation to load the table, you can get a dramatic improvement in performance by setting index OFF before load and ON after load. I think the opposite is the default behaviour.

JJP1 · Posted 05-16-2019 09:14 AM

Hi Erik,

Currently the tables does not have any indexes .We don't want to create the index on table,is there any other option please which can be efficient to join the data please.

ChrisNZ · Posted 05-16-2019 10:11 PM

You had a solution of merging the small tables, and keeping the large table sorted, and doing a 2-table merge. What happened to that?

High-Performance SAS Coding - Third Edition

ChrisNZ · Posted 05-03-2019 08:22 PM

1. > Actually we are using the mentioned code in sql join transformation in SAS DI application.

One of SQL's main drawbacks is that it doesn't keep data sets sorted.

So you have to sort again each time.

When you use data steps, you don't have the issue. If you maintain your large table sorted, then you never have to sort it again. Same for the other tables.

Having to start from random-order data is hugely expensive when processing full tables, Doing this does not matter for transactions, but it is out of place for BI.

2. > it will give error after running very long time saying "Insufficient Space error " or sometimes "sort execution error"

Please provide the exact error message.

If the issue is disk space. then using SPDE data sets and binary compression could solve the problem for SAS data sets, and using a compressed directory could help for utility files.

If the issue is RAM, then you need to modify the query or make more RAM available to the process.

You can add option _method after proc sql to get information about the decisions currently taken by the SQL optimiser.

3. Do you keep columns from all tables?

High-Performance SAS Coding - Third Edition

ScottBass · Posted 05-04-2019 03:57 AM

Thanks for this,Actually we are using the mentioned code in sql join transformation in SAS DI application.

There are many transformations in DI Studio (DIS). The SQL transformation is just one of them. Are you aware of the user-written transformation? Also, if you don't know what you're doing, DIS can create some very inefficient code.

So we need to create a job in SAS DI to resolve the insufficent space issue.

DIS is just a SAS code generator. You could do your development in any SAS tool, such as DMS, EG, SAS/Studio, etc. then convert the efficient, working code to DIS.

I don't think we can create data steps in existing SAS DI job

That is incorrect.

kindly suggest and please help in SAS DI application approach please

Again, DIS just creates SAS code. Including data step code via the user-written transformation.

please correct me if i am wrong

Ok, you're wrong. I've corrected you. See above comments.

let me know the efficient way to do this please.

I have already done this via my previous reply.

Questions:

1) Did you even try the example code provided in the second link? All you have to do is cut-and-paste that self-contained code and run it. When you ran it, what was the result? Did you understand how it worked?

2) Say your ID columns are numeric, for example an incrementing integer. Have you tried creating representative subsets of your data, such as:

data AAAAAAA_subset;
   set AAAAAAA;
   where ID between 1 and 100000;
run;

(and same for the rest of your source tables)

then run the example code I provided? If so, what was the result?

We can only provide you example solutions. If you choose not to even try them then that's on you. Even if you chose not to use the approach I suggested, it is a good technique to know.

Good luck, hope you get this working.

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away