BookmarkSubscribeRSS Feed
ChrisNZ
Tourmaline | Level 20
1. You don't need an index if the tables are sorted.
2. Try creating the table in the SPDE library. Not in AAA.
JJP1
Pyrite | Level 9

Hi @ChrisNZ,

Would you please help how can i create dataset in SPDE library as i am new to SPDE library please.

sorry to trouble you with simple queries.

i was assuming that if i run below statement then AAA library will be converted as SPDE library.please guide me

 

libname SPEEDY spde "%sysfunc(pathname(AAA))" compress=binary partsize=500g;

ChrisNZ
Tourmaline | Level 20
Runnng the libnane statement creates the library.
So then you have 2 libraries. No conversion.
Just use it like you use the AAA library.
ScottBass
Rhodochrosite | Level 12

@ChrisNZ wrote:
1. You don't need an index if the tables are sorted.
2. Try creating the table in the SPDE library. Not in AAA.

 

To clarify, you need an index if you are using index key lookup (my previous code).  You may not need the index if you're using proc sql and your data is sorted.

 

You will benefit from reading this:  http://support.sas.com/techsup/technote/ts553.html.

 

Use the proc sql _method operand to see if proc sql is sorting your data before joining.  If proc sql is sorting, that will be a huge performance and utility space hit.  I have this personal saying "sorting is evil".  Often you need it, but avoid it when you can (for example sorting your large data overnight).  If at all possible, you want to coax proc sql NOT TO SORT YOUR TABLES.

 

See also:

http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000766829.htm

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000131184.htm

https://support.sas.com/resources/papers/proceedings/proceedings/sugi31/131-31.pdf (search on sortedby)

https://support.sas.com/resources/papers/proceedings/proceedings/sugi29/127-29.pdf (search on sortedby)

 

If you use the index key lookup technique, you have more control over the code execution, and are guaranteed not to be sorting your data.  But you need to create the index.

 

To convert your data to spde:

 

proc copy in=your_current_library out=your_spde_library;
   select your_desired_tables;
run;

Or you can run a data step(s) to get the same results.

 

Of course, your indexes need to be in the same library location as your dataset.


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
ChrisNZ
Tourmaline | Level 20

@ScottBass I mention that indexes are not needed if tables are sorted as @JJP1  mentionned "and finally used merge statement to run both permanent tables based on the merge key"

Other options are possible of course. 

 

@JJP1 Some methods are compared here.

https://communities.sas.com/t5/SAS-Communities-Library/Study-on-the-best-method-to-join-two-tables/t...

 

 

Patrick
Opal | Level 21

@JJP1 

Just to add what other's wrote:

IF it's your ETL which also creates the big table (103 GB) then a simple measure would be to create an index on this table there (just add the index for ID to the table metadata definition and SAS will create the index (if using the Table Loader transformation then you will also need select when to create/maintain the index in this transformation).

 

IF the physical table already exists and your ETL is not always re-creating it then also copy/past the DIS generated code creating the index and execute this code as a one-off job.

 

There are also other techniques available to improve lookup performance but what will work depends on the details of your data and will require user written code.

 

Try to avoid DISTINCT as this requires another pass with sorting through your data. There are likely other ways to get to the result you want but you would need to provide more detail about your data and the relationships between the tables to propose something concrete.

 

JJP1
Pyrite | Level 9

Hi @Patrick,

Sorry for coming back late on this.actually would you please suggest other option other than indexing please(as already these tables are running daily with out indexes please.)

 

 

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @JJP1 

 

If you use a Table Loader transformation to load the table, you can get a dramatic improvement in performance by setting index OFF before load and ON after load. I think the opposite is the default behaviour.

JJP1
Pyrite | Level 9

Hi Erik,

Currently the tables does not have any indexes .We don't want to create the index on table,is there any other option please which can  be efficient to join the data please.

 

 

ChrisNZ
Tourmaline | Level 20

You had a solution of merging the small tables, and keeping the large table sorted, and doing a 2-table merge. What happened to that?

ChrisNZ
Tourmaline | Level 20

1. > Actually we are using the mentioned code in sql join transformation in SAS DI application.

One of SQL's main drawbacks is that it doesn't keep data sets sorted.

So you have to sort again each time.

When you use data steps, you don't have the issue. If you maintain your large table sorted, then you never have to sort it again. Same for the other tables.

Having to start from random-order data is hugely expensive when processing full tables, Doing this does not matter for transactions, but it is out of place for BI.

 

2. > it will give error after running very long time saying "Insufficient Space error " or sometimes "sort execution error"

Please provide the exact error message.

If the issue is disk space. then using SPDE data sets and binary compression could solve the problem for SAS data sets, and using a compressed directory could help for utility files. 

If the issue is RAM, then you need to modify the query or make more RAM available to the process.

You can add option _method after proc sql to get information about the decisions currently taken by the SQL optimiser.

 

3. Do you keep columns from all tables?

 

 

 

ScottBass
Rhodochrosite | Level 12

Thanks for this,Actually we are using the mentioned code in sql join transformation in SAS DI application.

There are many transformations in DI Studio (DIS).  The SQL transformation is just one of them.  Are you aware of the user-written transformation?  Also, if you don't know what you're doing, DIS can create some very inefficient code.

 

So we need to create a job in SAS DI to resolve the insufficent space issue.

DIS is just a SAS code generator.  You could do your development in any SAS tool, such as DMS, EG, SAS/Studio, etc. then convert the efficient, working code to DIS.

 

I don't think we can create data steps in existing SAS DI job 

That is incorrect.

 

kindly suggest and please help in SAS DI application approach please

Again, DIS just creates SAS code.  Including data step code via the user-written transformation.

 

please correct me if i am wrong

Ok, you're wrong.  I've corrected you.  See above comments.

 

let me know the efficient way to do this please.

I have already done this via my previous reply.

 

Questions:

 

1) Did you even try the example code provided in the second link?  All you have to do is cut-and-paste that self-contained code and run it.  When you ran it, what was the result?  Did you understand how it worked?

 

2) Say your ID columns are numeric, for example an incrementing integer.  Have you tried creating representative subsets of your data, such as:

 

data AAAAAAA_subset;
   set AAAAAAA;
   where ID between 1 and 100000;
run;

(and same for the rest of your source tables)

 

then run the example code I provided?  If so, what was the result?

 

We can only provide you example solutions.  If you choose not to even try them then that's on you.  Even if you chose not to use the approach I suggested, it is a good technique to know.

 

Good luck, hope you get this working.


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 26 replies
  • 2852 views
  • 3 likes
  • 5 in conversation