Re: SPDE-code optimisation

Kayalvizhi · Posted 03-24-2022 12:19 PM

SPDE PROC SQL code utilising a large SAS work space. Please suggests me ideas to optimize and use the work space to minimal.

-In current scenario, I'm performing PROC SQL join on 36 datasets having 12 columns each.

Kindly consider below limitations:

-Without redirecting - work or - utiloc locations

-Without using system commands

Please let me know for further details.

Sajid01 · Posted 03-24-2022 01:50 PM

@Kayalvizhi
Proc SQL joins do have large workspace requirement. Why don't you try using data step. That works row by row.

Kayalvizhi · Posted 03-24-2022 11:59 PM

I’m migrating huge set of codes from SAS to SPDE( using HDFS). So I doesn’t want to make much changes to existing code.

Can you please tell me why SPDE SQL using SAS workspace, and why not HDFS workspace.

Is there a way to use HDFS as workspace when using SPDE

SASKiwi · Posted 03-24-2022 08:16 PM

Please post your code so we can see the type of joins you are using. We can't offer much useful advice until we see what you are currently doing.

Kayalvizhi · Posted 03-24-2022 11:46 PM

Proc sql;
Create table spde.new from select
*
<some functions are used>
from spde.temp %do I=1 %to 36;
Left join. Spde.month&i ; %end;
Run

SASKiwi · Posted 03-25-2022 12:12 AM

Without knowing more about the data and the joining, that would work better with appending using the SET statement:

data want;
  set spde.month1 - spde.month36;
run;

Kurt_Bremser · Posted 03-25-2022 04:46 AM

@Kayalvizhi wrote:
Proc sql;
Create table spde.new from select
*
<some functions are used>
from spde.temp %do I=1 %to 36;
Left join. Spde.month&i ; %end;
Run

This code produces only syntax errors.

there's a dot where it must not be
the semicolon at the end of the first spde.month&i terminates the CREATE TABLE, the others will be invalid
there are no ON conditions
PROC SQL needs to be terminated with a QUIT, and no RUNs are needed, as SQL statements are always executed immediately.

Please post something that at least resembles your real code, and give us at least some ideas of the datasets involved (variables used for matching, variables contained, match relationships).

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

LinusH · Posted 03-28-2022 08:43 AM

You can use the SPDEUTILLOC option to specify where SPDE utility files will be stored.

Unclear how this wirks with hdfs implmentaions of SPDE data.

Maybe the environment variable SPDE_HADOOP_WORK_PATH is the way to go: variable https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/engspdehdfsug/n07ieiyvjz2wrxn10n7bs7xupvkc.ht...

You can also use SPDESORTSIZE if you have available memory, which will impove performance and lessen the burdon on utility file areas.

Data never sleeps

Kayalvizhi · Posted 03-29-2022 06:50 AM

Hi,

Thanks!!.

>SPDEUTILLOC- Not privileged to use options while invocation/change config file. So unable to use this option.

>SPDESORTSIZE- Increasing spdesortsize uses more/full workspace (which I don't want to happen).

>SPDE_HADOOP_WORK_PATH - I wanted to know more on how it works, because when I give a HDFS location, it still seems to use workspace and not HDFS path.

LinusH · Posted 03-31-2022 11:52 AM

I think the idea is to use as much RAM as possible by tweaking SPDESORTSIZE, hence use less disk space.

I haven't used SPDE in hdfs other than in a test environment (and that was quite a while ago), and I just refer to the documentation.

But since at least I think the documentation does not clarify how different SPDE options work in a hdfs scenario, I suggest that you contact SAS tech support to get advice/clarifications.

Data never sleeps

Registration is open

SAS Training: Just a Click Away