- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
SPDE PROC SQL code utilising a large SAS work space. Please suggests me ideas to optimize and use the work space to minimal.
-In current scenario, I'm performing PROC SQL join on 36 datasets having 12 columns each.
Kindly consider below limitations:
-Without redirecting - work or - utiloc locations
-Without using system commands
Please let me know for further details.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Kayalvizhi
Proc SQL joins do have large workspace requirement. Why don't you try using data step. That works row by row.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can you please tell me why SPDE SQL using SAS workspace, and why not HDFS workspace.
Is there a way to use HDFS as workspace when using SPDE
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please post your code so we can see the type of joins you are using. We can't offer much useful advice until we see what you are currently doing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Create table spde.new from select
*
<some functions are used>
from spde.temp %do I=1 %to 36;
Left join. Spde.month&i ; %end;
Run
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Without knowing more about the data and the joining, that would work better with appending using the SET statement:
data want;
set spde.month1 - spde.month36;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Kayalvizhi wrote:
Proc sql;
Create table spde.new from select
*
<some functions are used>
from spde.temp %do I=1 %to 36;
Left join. Spde.month&i ; %end;
Run
This code produces only syntax errors.
- there's a dot where it must not be
- the semicolon at the end of the first spde.month&i terminates the CREATE TABLE, the others will be invalid
- there are no ON conditions
- PROC SQL needs to be terminated with a QUIT, and no RUNs are needed, as SQL statements are always executed immediately.
Please post something that at least resembles your real code, and give us at least some ideas of the datasets involved (variables used for matching, variables contained, match relationships).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can use the SPDEUTILLOC option to specify where SPDE utility files will be stored.
Unclear how this wirks with hdfs implmentaions of SPDE data.
Maybe the environment variable SPDE_HADOOP_WORK_PATH is the way to go: variable https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/engspdehdfsug/n07ieiyvjz2wrxn10n7bs7xupvkc.ht...
You can also use SPDESORTSIZE if you have available memory, which will impove performance and lessen the burdon on utility file areas.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks!!.
>SPDEUTILLOC- Not privileged to use options while invocation/change config file. So unable to use this option.
>SPDESORTSIZE- Increasing spdesortsize uses more/full workspace (which I don't want to happen).
>SPDE_HADOOP_WORK_PATH - I wanted to know more on how it works, because when I give a HDFS location, it still seems to use workspace and not HDFS path.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think the idea is to use as much RAM as possible by tweaking SPDESORTSIZE, hence use less disk space.
I haven't used SPDE in hdfs other than in a test environment (and that was quite a while ago), and I just refer to the documentation.
But since at least I think the documentation does not clarify how different SPDE options work in a hdfs scenario, I suggest that you contact SAS tech support to get advice/clarifications.