Hi @ChrisNZ,
Would you please help how can i create dataset in SPDE library as i am new to SPDE library please.
sorry to trouble you with simple queries.
i was assuming that if i run below statement then AAA library will be converted as SPDE library.please guide me
libname SPEEDY spde "%sysfunc(pathname(AAA))" compress=binary partsize=500g;
@ChrisNZ wrote:
1. You don't need an index if the tables are sorted.
2. Try creating the table in the SPDE library. Not in AAA.
To clarify, you need an index if you are using index key lookup (my previous code). You may not need the index if you're using proc sql and your data is sorted.
You will benefit from reading this: http://support.sas.com/techsup/technote/ts553.html.
Use the proc sql _method operand to see if proc sql is sorting your data before joining. If proc sql is sorting, that will be a huge performance and utility space hit. I have this personal saying "sorting is evil". Often you need it, but avoid it when you can (for example sorting your large data overnight). If at all possible, you want to coax proc sql NOT TO SORT YOUR TABLES.
See also:
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000766829.htm
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000131184.htm
https://support.sas.com/resources/papers/proceedings/proceedings/sugi31/131-31.pdf (search on sortedby)
https://support.sas.com/resources/papers/proceedings/proceedings/sugi29/127-29.pdf (search on sortedby)
If you use the index key lookup technique, you have more control over the code execution, and are guaranteed not to be sorting your data. But you need to create the index.
To convert your data to spde:
proc copy in=your_current_library out=your_spde_library;
select your_desired_tables;
run;
Or you can run a data step(s) to get the same results.
Of course, your indexes need to be in the same library location as your dataset.
@ScottBass I mention that indexes are not needed if tables are sorted as @JJP1 mentionned "and finally used merge statement to run both permanent tables based on the merge key"
Other options are possible of course.
@JJP1 Some methods are compared here.
Just to add what other's wrote:
IF it's your ETL which also creates the big table (103 GB) then a simple measure would be to create an index on this table there (just add the index for ID to the table metadata definition and SAS will create the index (if using the Table Loader transformation then you will also need select when to create/maintain the index in this transformation).
IF the physical table already exists and your ETL is not always re-creating it then also copy/past the DIS generated code creating the index and execute this code as a one-off job.
There are also other techniques available to improve lookup performance but what will work depends on the details of your data and will require user written code.
Try to avoid DISTINCT as this requires another pass with sorting through your data. There are likely other ways to get to the result you want but you would need to provide more detail about your data and the relationships between the tables to propose something concrete.
Hi @Patrick,
Sorry for coming back late on this.actually would you please suggest other option other than indexing please(as already these tables are running daily with out indexes please.)
Hi @JJP1
If you use a Table Loader transformation to load the table, you can get a dramatic improvement in performance by setting index OFF before load and ON after load. I think the opposite is the default behaviour.
Hi Erik,
Currently the tables does not have any indexes .We don't want to create the index on table,is there any other option please which can be efficient to join the data please.
You had a solution of merging the small tables, and keeping the large table sorted, and doing a 2-table merge. What happened to that?
1. > Actually we are using the mentioned code in sql join transformation in SAS DI application.
One of SQL's main drawbacks is that it doesn't keep data sets sorted.
So you have to sort again each time.
When you use data steps, you don't have the issue. If you maintain your large table sorted, then you never have to sort it again. Same for the other tables.
Having to start from random-order data is hugely expensive when processing full tables, Doing this does not matter for transactions, but it is out of place for BI.
2. > it will give error after running very long time saying "Insufficient Space error " or sometimes "sort execution error"
Please provide the exact error message.
If the issue is disk space. then using SPDE data sets and binary compression could solve the problem for SAS data sets, and using a compressed directory could help for utility files.
If the issue is RAM, then you need to modify the query or make more RAM available to the process.
You can add option _method after proc sql to get information about the decisions currently taken by the SQL optimiser.
3. Do you keep columns from all tables?
Thanks for this,Actually we are using the mentioned code in sql join transformation in SAS DI application.
There are many transformations in DI Studio (DIS). The SQL transformation is just one of them. Are you aware of the user-written transformation? Also, if you don't know what you're doing, DIS can create some very inefficient code.
So we need to create a job in SAS DI to resolve the insufficent space issue.
DIS is just a SAS code generator. You could do your development in any SAS tool, such as DMS, EG, SAS/Studio, etc. then convert the efficient, working code to DIS.
I don't think we can create data steps in existing SAS DI job
That is incorrect.
kindly suggest and please help in SAS DI application approach please
Again, DIS just creates SAS code. Including data step code via the user-written transformation.
please correct me if i am wrong
Ok, you're wrong. I've corrected you. See above comments.
let me know the efficient way to do this please.
I have already done this via my previous reply.
Questions:
1) Did you even try the example code provided in the second link? All you have to do is cut-and-paste that self-contained code and run it. When you ran it, what was the result? Did you understand how it worked?
2) Say your ID columns are numeric, for example an incrementing integer. Have you tried creating representative subsets of your data, such as:
data AAAAAAA_subset;
set AAAAAAA;
where ID between 1 and 100000;
run;
(and same for the rest of your source tables)
then run the example code I provided? If so, what was the result?
We can only provide you example solutions. If you choose not to even try them then that's on you. Even if you chose not to use the approach I suggested, it is a good technique to know.
Good luck, hope you get this working.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.