BookmarkSubscribeRSS Feed
kcskaiser
Fluorite | Level 6

I took over a big proc sql project and found this code inside;

...

(case when (substr(t1.behavior,1,1)= 'R')

    then t1.behavior

    else  substr(t1.behavior,1,6) end) as Cost_Pool,

...

My first shot at this issue was to do this in hopes of cuttind down the processing time. It cut it by over half.

...

(case

    when t1.behavior like 'R%' then t1.behavior

    else substr(t1.behavior,1,6)

...

This code worked great. But I noticed that 'R' is never the first character of that row. In fact the only two I'm interested in would be 'M' or 'U'.

So my question is, how to best go about doing both the like condition and the substr function better.

I don't like substringing things in proc sql, it slows the world down. So how best to truncate to only the first 6 bytes without using substr?

Back to the 'LIKE', I figured doing this would work;

...

(case

     when (t1.behavior like 'U%' or t1.behavior like 'M%') then substr(t1.behavior,1,6)

     else t1.behavior end) as Cost_pool,

...

But it seems to me there must be a better way to handle the 'LIKE' than running it twice.

So thats the question, does anybody have any suggestions on the best way to optimize this scenario?

5 REPLIES 5
Reeza
Super User

Perl Regular Expressions most likely would possibly be faster.

I avoid those like the plague so hopefully someone else can offer a code solution.

Here's some reference otherwise:

http://www2.sas.com/proceedings/sugi29/265-29.pdf

jakarman
Barite | Level 11

You are having a big a big data project. That is going beyond comfort zone.

Using sql is in comfort zone. Nosql hadoop etc. are originating from that.

If it is about really big data and processing time is important start with the technical restrictions to overcome and proceed with that.

If it is a sas - dataset and needing all data to be processed think of using the datastep may be split as you are running in a grid. The prx functions are a fit to this.

Is your data stored in a rdbms review those options with in database processing.

Just trying to optimize code when fundamentals are failing will not help much.

---->-- ja karman --<-----
Ksharp
Super User

Why not use

when t1.behavior like '%R%'

or

when t1.behavior contains 'R'

Peter_C
Rhodochrosite | Level 12

1

I'm not sure whether the colon(:) modifier to IN() works in SAS Sql (it does in other native dbms sql)

That would enable

Case when t.behaviour in: ( 'M', 'R', 'WRT' )

2

However a lesser spotted operator might help.

SAS SQL does support "truncated" comparison:

EQT  a bit like =:

GTT  >:

LTT   <:

GET  >=:

LET  <=:

Of course it is not exactly the same as the data step operators. Only one side is "truncated" in these SQL cokparisons whereas either side could be truncated in the datastep operators.

I think that makes these SQL truncated comparisons "safer".

3

If your LIKE search is more general than "begins with", you could put the patterns you seek into a table and make an SQL join with rhe LIKE operator.

Of course I'm not sure it works in SAS as I have only used that kind of join in Teradata sql

Hope one of these  3 helps you

good luck

peterC

jakarman
Barite | Level 11

PeterC  I do not like the use of undocumented functions. They are mostly undocumented for some reason. The monotonic for example is dependent on a order only being logical possible when not doing multi-threading.  Multi-threading is becoming the normal approach as it should be a gaining turn-arround time when several other conditions are in place. This is the big-data beyond comfort zone area.

A like instruction SAS(R) 9.4 SQL Procedure User's Guide is fully documented and it can be passed to an external DBMS.   That is the same for eg Teradata supporting a lot of functions that are (should be) transported in Teradata sql.SAS/ACCESS(R) 9.4 for Relational Databases: Reference, Fifth Edition.

The goal is letting the data reside and processed as close where it is stored. Avoiding additional transfers and copies.

kckaiser did not mention anything of the physical involved environment.  When there is no optimized physical environment and he is trying to do some things in a limited existing sas environment the best choices could be totally different. When the data is stored locally and is limited of a single storage device the IO-time is the biggest factor to deal with. In that case a classic sas datastep can solve sometimes for more efficient some type of logical problems.  In the big data world is NO_SQL hyping.              

---->-- ja karman --<-----

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 7874 views
  • 0 likes
  • 5 in conversation