BookmarkSubscribeRSS Feed
Bal23
Lapis Lazuli | Level 10
proc sql;
create table  lib1.want as
         select *
            from have
			where va1=2
           ;
quit;

will use datastep faster than proc sql?

It is estimated that it will run two hours to get the new dataset I want based on the code above. So it is hard to try different code. I would like to get advice from you on how to run it efficiently and faster. Thank you 

15 REPLIES 15
Kurt_Bremser
Super User

Since that is a sequential read through the dataset without much processing, you won't gain anything by using a data step. You're simply bound by the sheer amount of data that has to be read.

Depending on the structure of have, you might gain something (or even a lot) if you use the compress=yes option when you store it, and do similar for dataset want.

AhmedAl_Attar
Ammonite | Level 13

Hi,

 

If you the following can be applied to you, then your query should run faster:

- Your under lying storage system, allows for 50-75 MB/sec sustained I/O throughput

- Use SPDE/SPDS for your Output Library (lib1) to speed the writing of your output table

http://support.sas.com/rnd/scalability/spde/index.html

- Use Index for your Va1 variable to speed the reading of your subset/filtered data

 

Hope this helps,

Ahmed

Kurt_Bremser
Super User

As an addendum to what @AhmedAl_Attar said, if you can throw hardware at the problem, consider to set up an array of disks and span a SPDE library over those. For SPDE to really improve your throughput, you need to remove the bottleneck caused by a single disk having to carry all your storage traffic.

Shmuel
Garnet | Level 18

Assuming you are going to read the file several times, each time with different VA1 value,

and the file is mostly static, then do the next preparing steps once:

 

- use option compress=yes 

- sort the file by VA1 or by variables used for WHERE selection and

   create the output with indexes of those variables.

   Pay attention - indexes needs storage

- If applicable use SPDE/SPDS for the sorted output

 

Thus, using advices of @Kurt_Bremser and of @AhmedAl_Attar

 

The code should look like:

libname input '.....';
libname output SPDE '.......';

proc sort data=input.have
                 out=output.want (index=VA1 ....);  /* add expected variables used with WHERE */
         by  VA1 ..... ;  /* same variables defined for INDEX */
run;

 

 

 

 

AhmedAl_Attar
Ammonite | Level 13

One more potential way to speed your query, would be

- if you have SAS 9.4, you may want to try the Threaded Read feature of Proc DS2, that's available in BASE SAS.

 

Here is an implementation example

 

options threads; /* Tells SAS to use threaded processing */
%let have_ds = have;
%let filter= va1=2;

proc ds2;
	thread newton/overwrite=yes;
		/* Some debugging variables */
		dcl double count thisThread;
		drop count thisThread;

		method run();
			SET {select * from &have_ds where &filter}; /* Ability to use FedSql query within DS2 */

			/* Assign/Initialize debugging vars */
			thisThread=_threadid_;
			count+1;
		end;

		method term();
			put '**Thread' _threadid_ 'processed'  count 'rows:';
		end;
	endthread;

	data lib1.want/overwrite=yes;
		dcl thread newton frac; /* Declare an Instance of the newton thread */

		method run();
			set from frac threads=4; /* <--- You Change the number of threads to control the threaded reads */
		end;
	enddata;
run;
quit;

Please note, according to the Proc DS2 documentation, "If one computation thread can keep up with the I/O thread, then that single thread is used for all computation."

 

Ahmed

 

 

rajdeep
Pyrite | Level 9

hi 

 

Can you please suggest how to run the below code in PROC DS2, because it's throwing error. In the other hand it's running successfully in generic PROC SQL.

 

proc sql;
SELECT DISTINCT t1.*,t2.DATE_MN,
          /* Calculation */
            (SUM(DISTINCT(t2.CURRENCY_AMOUNT))) FORMAT=15.2 AS ALL_TRANSACTION
      FROM  WORK.Acquisition_Score_Distinct t1
           LEFT JOIN WORK.ALL_TRAN_TYPE_KEY_FILTER t2 ON (t1.ACCOUNT_NUMBER = t2.ACCOUNT_NUMBER)
     GROUP BY t1.ACCOUNT_NUMBER,t2.DATE_MN
quit;

The above left and right tables are having 10 cores data, so want to use PROC DS2 for the same. Please suggest.

 

Thanks in advance. 

Kurt_Bremser
Super User

And what is a "10 cores data"?

If you want to give us information about the size of your data, use orders of magnitude that are easily understood everywhere - thousand, million, billion - or, even better, designations like K (kilo), M (mega), G (giga), T (tera), and so on.

rajdeep
Pyrite | Level 9

Hi 

 

 

proc ds2; 
	thread newton/overwrite=yes;
			method run();
			SET { SELECT DISTINCT t1.*,t2.DATE_MN,
          /* Calculation */
            (SUM(DISTINCT(t2.CURRENCY_AMOUNT))) AS ALL_TRANSACTION
      FROM  WORK.Acquisition_Score_Distinct t1
           LEFT JOIN WORK.ALL_TRAN_TYPE_KEY_FILTER t2 ON (t1.ACCOUNT_NUMBER = t2.ACCOUNT_NUMBER)
     GROUP BY t1.ACCOUNT_NUMBER,t2.DATE_MN}; /* Ability to use FedSql query within DS2 */

				end;

	endthread;
data All_Month_Transactions/overwrite=yes;
		dcl thread newton frac1; 

		method run();
			set from frac1 threads=4; 
		end;
	enddata;
run;
quit;

NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
22         
23         GOPTIONS ACCESSIBLE;
24         proc ds2; /*Score column mapping based on Postal code and FSA*/
25         	thread newton/overwrite=yes;
26         			method run();
27         			SET { SELECT DISTINCT t1.*,t2.DATE_MN,
28                   /* Calculation */
29                     (SUM(DISTINCT(t2.CURRENCY_AMOUNT))) AS ALL_TRANSACTION
30               FROM  WORK.Acquisition_Score_Distinct t1
31                    LEFT JOIN WORK.ALL_TRAN_TYPE_KEY_FILTER t2 ON (t1.ACCOUNT_NUMBER = t2.ACCOUNT_NUMBER)
32              GROUP BY t1.ACCOUNT_NUMBER,t2.DATE_MN}; /* Ability to use FedSql query within DS2 */
33         
34         				end;
35         
36         	endthread;
37         data All_Month_Transactions/overwrite=yes;
38         		dcl thread newton frac1; /* Declare an Instance of the newton thread */
39         
40         		method run();
41         			set from frac1 threads=4; /* <--- You Change the number of threads to control the threaded reads */
42         		end;
43         	enddata;
44         run;
ERROR: Compilation error.
ERROR: Column "T1.PCF_CUST_ID" must be GROUPed or used in an aggregate function
ERROR: Line 27: Unable to prepare SELECT statement for table { SELECT DISTINCT t1.*,t2.DATE_MN,   
       (SUM(DISTINCT(t2.CURRENCY_AMOUNT))) AS ALL_TRANSACTION  FROM WORK.Acquisition_Score_Distinct t1  LEFT JOIN 
       WORK.ALL_TRAN_TYPE_KEY_FILTER t2 ON (t1.ACCOUNT_NUMBER = t2.ACCOUNT_NUMBER)  GROUP BY t1.ACCOUNT_NUMBER,t2.DATE_MN} 
       (rc=0x80fff802U).
NOTE: PROC DS2 has set option NOEXEC and will continue to prepare statements.
45         quit;

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE DS2 used (Total process time)

The above log is asking to include "T1.PCF_CUST_ID" in the grouping, but I am not using the same column at all in my code. Neither in the group by nor with any aggregate function.

 

Please help if I understood something wrong or if there is any syntax mistake with PROC DS2. Please let me know if any other details are required.

 

Thanks in Advance.

 

 

 

 

Kurt_Bremser
Super User

You use the asterisk for all variables of t1, so they must be accounted for. Use a discrete list of variables instead, so you don't get PCF_CUST_ID included implicitly (and you don't need DISTINCT IMO):

			SET { SELECT t1.ACCOUNT_NUMBER,t2.DATE_MN,
          /* Calculation */
            (SUM(DISTINCT(t2.CURRENCY_AMOUNT))) AS ALL_TRANSACTION
      FROM  WORK.Acquisition_Score_Distinct t1
           LEFT JOIN WORK.ALL_TRAN_TYPE_KEY_FILTER t2 ON (t1.ACCOUNT_NUMBER = t2.ACCOUNT_NUMBER)
     GROUP BY t1.ACCOUNT_NUMBER,t2.DATE_MN}; /* Ability to use FedSql query within DS2 */

 

rajdeep
Pyrite | Level 9

Hi 

 

 

 

data test;
set sashelp.class (obs=10);
run;
data test1;
set sashelp.class(obs=15);
run;

proc ds2;
 thread thread_name1 / overwrite = yes;
 method run();
 set {Select distinct t1.name,t2.sex,t2.age
from test t1 left
join test1 t2 on 
strip(upcase(t1.name)) =* strip(upcase(t2.name))};
 end;
 endthread;

 data test3 / overwrite = yes;
 dcl thread thread_name1 t;
 method run();
 set from t threads = 4;
 end;
 enddata;
 run;
quit;

23         GOPTIONS ACCESSIBLE;
24         proc ds2;
25          thread thread_name1 / overwrite = yes;
26          method run();
27          set {Select distinct t1.name,t2.sex,t2.age
28         from test t1 left
29         join test1 t2 on
30         strip(upcase(t1.name)) =* strip(upcase(t2.name))};
31          end;
32          endthread;
33         
34          data test3 / overwrite = yes;
35          dcl thread thread_name1 t;
36          method run();
37          set from t threads = 4;
38          end;
39          enddata;
40          run;
ERROR: Compilation error.
ERROR: Operator does not exist:  NVARCHAR =* NVARCHAR
WARNING: No operator matches the given name and argument type(s). You might need to add explicit typecasts.
ERROR: Line 27: Unable to prepare SELECT statement for table {Select distinct t1.name,t2.sex,t2.age  from test t1 left  
       join test1 t2 on  strip(upcase(t1.name)) =* strip(upcase(t2.name))} (rc=0x80fff802U).
NOTE: PROC DS2 has set option NOEXEC and will continue to prepare statements.
41         quit;

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE DS2 used (Total process time):
      real time           1.07 seconds
      cpu time            0.35 seconds

 

 

Kurt_Bremser
Super User

When you use 35 variables in a SQL select that also has a group by, every one of those 35 variables must either be one of the group variables, or subject to a summary function.

It is only in PROC SQL that SAS allows you to have additional variables, and then it does the automatic remerge, something that is not done in ANSI SQL or FedSQL.

 

And =* is not a valid operator. From where did you get that? If you want a "not equal", use the mnemonic ne or ^=.

rajdeep
Pyrite | Level 9

Hi  

 

 

 

 

 

 


proc ds2; /*Score column mapping based on Postal code and FSA*/
	thread newton/overwrite=yes;
dcl double Level;
	  method convert(double level) returns double;
 set LIB_TS23.CIF_ACCOUNT_CURR;
 Level=inputn(CIFP_CUSTOM_DATA_81,8.);
 return Level;
 end;
		method run();
			SET {select distinct
      TRIM(LEFT(PUT(CIF_ACCOUNT_CURR.CIFP_ACCOUNT_ID5,BEST32.))) as MAST_ACCOUNT_ID
   from LIB_TS23.CIF_ACCOUNT_CURR as CIF_ACCOUNT_CURR 
inner join
      LIB_ADM.ADM_STD_DISP_USER_V as ADM_STD_DISP_USER_V
   on  ADM_STD_DISP_USER_V.APA_APP_NUM = INPUTN(CIF_ACCOUNT_CURR.CIFP_CUSTOM_DATA_81,8.)
      and CIF_ACCOUNT_CURR.CIFP_OFFICER_NBR = 'ADMACE'}; /* Ability to use FedSql query within DS2 */

				end;
	endthread;
data test/overwrite=yes;
		dcl thread newton frac1; /* Declare an Instance of the newton thread */
		method run();
			set from frac1 threads=4; /* <--- You Change the number of threads to control the threaded reads */
		end;
	enddata;
run;
quit;

ERROR: Compilation error.
ERROR: Function INPUT(VARCHAR, DOUBLE) does not exist
WARNING: No function matches the given name and argument types.
ERROR: Line 34: Unable to prepare SELECT statement for table {select distinct  
       TRIM(LEFT(PUT(CIF_ACCOUNT_CURR.CIFP_ACCOUNT_ID5,BEST32.))) as MAST_ACCOUNT_ID  from   LIB_TS23.CIF_ACCOUNT_CURR as 
       CIF_ACCOUNT_CURR  inner join  LIB_ADM.ADM_STD_DISP_USER_V as ADM_STD_DISP_USER_V  on   ADM_STD_DISP_USER_V.APA_APP_NUM 
       = INPUT(CIF_ACCOUNT_CURR.CIFP_CUSTOM_DATA_81,8.)  and CIF_ACCOUNT_CURR.CIFP_OFFICER_NBR = 'ADMACE'} (rc=0x80fff802U).
NOTE: PROC DS2 has set option NOEXEC and will continue to prepare statements.

 

Kurt_Bremser
Super User

The INPUTN() and INPUTC() function do not expect the format name like the INPUT() function does. You need to supply the format name as a character expression, like this:

ADM_STD_DISP_USER_V.APA_APP_NUM = INPUTN(CIF_ACCOUNT_CURR.CIFP_CUSTOM_DATA_81,'8.')

Note that in FedSQL, you MUST use single quotes. Double quotes are interpreted by FedSQL as "delimited identifiers" (similar to the 'some name'n name literals in Base SAS language).

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 4511 views
  • 0 likes
  • 6 in conversation