Hi ,
I am trying to perform a sort on a dataset on a oracle table by referencing the dataset
as Oracle_schema.table1
*--------------------------------------------------------------------------------------*
data _null_;
if 0 then set DW.Table1 ;
dcl hash hh ( dataset: ' DW.Table1', ordered: 'A', multidata: 'Y') ;
dcl hiter hi ( 'hh' ) ;
hh.DefineKey ( 'ADMIT_MONTH','PAtientid') ;
hh.DefineData ('FUND_MAJOR_DESC' ,'generic_month','ADMIT_MONTH','PAtientid') ;
hh.DefineDone () ;
do rc = hi.first () by 0 while ( rc = 0 ) ;
hh.output (dataset: "x1234") ;
rc = hi.next () ;
end ;
stop ;
run ;
****------------------------------------------------------------------**
But it's taking forever to execute . hasn't still completed execution. the table1 has 1.5 million records
I in parallel execute a proc sort for the above method but it took only 3 minutes to execute .
Could some one explain why the above hash code won't execute🤔
proc sort data= DW.Table1
(keep=FUND_MAJOR_DESC generic_month ADMIT_MONTH PAtientid)
out=x1234 ;
by ADMIT_MONTH PAtientid;
run;
*---------------------------------------------------------------------------
to confirm my hash code i executed with sashelp.cars and it works , so i presume the syntax is correct
data _null_;
if 0 then set sashelp.cars ;
dcl hash hh ( dataset: 'sashelp.cars', ordered: 'a',multidata: "Y") ;
dcl hiter hi ( 'hh' ) ;
hh.DefineKey ( 'make' ) ;
hh.DefineData ( 'invoice' , 'model','make') ;
hh.DefineDone () ;
do rc = hi.first () by 0 while ( rc = 0 ) ;
hh.output (dataset: "x1") ;
rc = hi.next () ;
end ;
put 13 * '-' ;
stop ;
run ;
Exactly. At least for SASHELP.CARS it should definitely not take 10 minutes, see below.
13   data _null_;
14   if 0 then set sashelp.cars ;
15   dcl hash hh ( dataset: 'sashelp.cars', ordered: 'a',multidata: "Y") ;
16
17   hh.DefineKey ( 'make' ) ;
18   hh.DefineData ( 'invoice' , 'model','make') ;
19   hh.DefineDone () ;
20
21   hh.output (dataset: "x1") ;
22   put 13 * '-' ;
23   stop ;
24   run ;
NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: The data set WORK.X1 has 428 observations and 3 variables.
-------------
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
For a much larger dataset I would not be surprised if PROC SORT was the most efficient approach to sorting.
Hi @dennis_oz,
Unlike the OUTPUT statement in a DATA step, the OUTPUT method of a hash object writes the entire hash table to the dataset specified in the "dataset:" argument tag, not only a single record. That is, you are trying to write the 1.5 million observations (unnecessarily) 1.5 million times to dataset x1234 (which is, of course, much more time-consuming than to create a copy of SASHELP.CARS and to overwrite it 427 times again).
So, you don't need the hash iterator object. Remove the DO loop and execute the OUTPUT method only once.
do youmean to take out the hiter statement and remove the do statement like below.
I have submitted the code and it has been more than 10 minutes.
*----------------------------------------------------------------------*
data _null_;
if 0 then set sashelp.cars ;
dcl hash hh ( dataset: 'sashelp.cars', ordered: 'a',multidata: "Y") ;
hh.DefineKey ( 'make' ) ;
hh.DefineData ( 'invoice' , 'model','make') ;
hh.DefineDone () ;
hh.output (dataset: "x1") ;
put 13 * '-' ;
stop ;
run ;
Exactly. At least for SASHELP.CARS it should definitely not take 10 minutes, see below.
13   data _null_;
14   if 0 then set sashelp.cars ;
15   dcl hash hh ( dataset: 'sashelp.cars', ordered: 'a',multidata: "Y") ;
16
17   hh.DefineKey ( 'make' ) ;
18   hh.DefineData ( 'invoice' , 'model','make') ;
19   hh.DefineDone () ;
20
21   hh.output (dataset: "x1") ;
22   put 13 * '-' ;
23   stop ;
24   run ;
NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: The data set WORK.X1 has 428 observations and 3 variables.
-------------
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
For a much larger dataset I would not be surprised if PROC SORT was the most efficient approach to sorting.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
