10-15-2014 06:41 PM
I took over a big proc sql project and found this code inside;
(case when (substr(t1.behavior,1,1)= 'R')
else substr(t1.behavior,1,6) end) as Cost_Pool,
My first shot at this issue was to do this in hopes of cuttind down the processing time. It cut it by over half.
when t1.behavior like 'R%' then t1.behavior
This code worked great. But I noticed that 'R' is never the first character of that row. In fact the only two I'm interested in would be 'M' or 'U'.
So my question is, how to best go about doing both the like condition and the substr function better.
I don't like substringing things in proc sql, it slows the world down. So how best to truncate to only the first 6 bytes without using substr?
Back to the 'LIKE', I figured doing this would work;
when (t1.behavior like 'U%' or t1.behavior like 'M%') then substr(t1.behavior,1,6)
else t1.behavior end) as Cost_pool,
But it seems to me there must be a better way to handle the 'LIKE' than running it twice.
So thats the question, does anybody have any suggestions on the best way to optimize this scenario?
10-15-2014 11:36 PM
Perl Regular Expressions most likely would possibly be faster.
I avoid those like the plague so hopefully someone else can offer a code solution.
Here's some reference otherwise:
10-16-2014 03:25 AM
You are having a big a big data project. That is going beyond comfort zone.
Using sql is in comfort zone. Nosql hadoop etc. are originating from that.
If it is about really big data and processing time is important start with the technical restrictions to overcome and proceed with that.
If it is a sas - dataset and needing all data to be processed think of using the datastep may be split as you are running in a grid. The prx functions are a fit to this.
Is your data stored in a rdbms review those options with in database processing.
Just trying to optimize code when fundamentals are failing will not help much.
10-18-2014 04:00 AM
I'm not sure whether the colon( modifier to IN() works in SAS Sql (it does in other native dbms sql)
That would enable
Case when t.behaviour in: ( 'M', 'R', 'WRT' )
However a lesser spotted operator might help.
SAS SQL does support "truncated" comparison:
EQT a bit like =:
Of course it is not exactly the same as the data step operators. Only one side is "truncated" in these SQL cokparisons whereas either side could be truncated in the datastep operators.
I think that makes these SQL truncated comparisons "safer".
If your LIKE search is more general than "begins with", you could put the patterns you seek into a table and make an SQL join with rhe LIKE operator.
Of course I'm not sure it works in SAS as I have only used that kind of join in Teradata sql
Hope one of these 3 helps you
10-18-2014 04:45 AM
PeterC I do not like the use of undocumented functions. They are mostly undocumented for some reason. The monotonic for example is dependent on a order only being logical possible when not doing multi-threading. Multi-threading is becoming the normal approach as it should be a gaining turn-arround time when several other conditions are in place. This is the big-data beyond comfort zone area.
A like instruction SAS(R) 9.4 SQL Procedure User's Guide is fully documented and it can be passed to an external DBMS. That is the same for eg Teradata supporting a lot of functions that are (should be) transported in Teradata sql.SAS/ACCESS(R) 9.4 for Relational Databases: Reference, Fifth Edition.
The goal is letting the data reside and processed as close where it is stored. Avoiding additional transfers and copies.
kckaiser did not mention anything of the physical involved environment. When there is no optimized physical environment and he is trying to do some things in a limited existing sas environment the best choices could be totally different. When the data is stored locally and is limited of a single storage device the IO-time is the biggest factor to deal with. In that case a classic sas datastep can solve sometimes for more efficient some type of logical problems. In the big data world is NO_SQL hyping.