Hi all I am trying to incorporate PROC HPDS2 in one of our modelling codes to improve the performance. We are using GREENPLUM database. However before implementing the high performance procedure in the actual modelling program, I tried running a sample code with simple arithmetics using PROC HPDS2. Strangely, the performance using PROC HPDS2 seemed to be poorer when compared to a normal datastep. I used the following code: libname gpdb greenplm server=xxxx db="xxxxx" user="xxxxx" password="xxxxx" schema="xxxxxx"; proc sql noprint; drop table gpdb.avg_temps; drop table gpdb.daily_temps; quit; options dbidirectexec; options set=GRIDHOST="xxxxxx "; options set=GRIDINSTALLLOC="xxxxxxxx"; options set=TKSSH_USER="xxxx"; options set=TKSSH_IDENTITY="xxxxxxx"; data gpdb.daily_temps; input city $ mon tue wed thu fri; datalines; lax 88 92 94 97 86 sfo 65 60 75 72 74 nyc 99 95 94 95 90 phl 92 89 91 93 94 atl 95 99 92 98 94 den 85 87 89 72 73 pit 68 70 72 73 77 rdu 98 95 99 95 96 dtt 88 90 90 87 92 anc 51 56 60 64 62 sea 72 78 77 80 79 msy 98 97 99 98 99 mia 95 92 98 94 96 ord 87 85 84 80 79 dfw 95 96 97 95 97 hou 98 99 98 97 92 las 104 105 102 99 101 pdx 78 82 76 74 80 san 80 81 78 82 80 phx 95 98 95 97 100 cle 75 72 76 80 78 ont 78 80 78 81 72 tpa 94 94 92 90 92 bos 80 78 77 75 79 clt 83 80 79 80 81 ; run; proc hpds2 data=gpdb.daily_temps out=gpdb.avg_temps; performance host="&GRIDHOST" install="&GRIDINSTALLLOC" nodes=all details; data DS2GTF.out; dcl double avgf avgc; dcl char(5) synopsis; method run(); set DS2GTF.in; avgf = mean(mon, tue, wed, thu, fri); avgc = round((avgf - 32.0) * 5.0/9.0, .1); if avgf >= 95.0 then synopsis = 'Hot'; else if avgf > 80.0 then synopsis = 'Warm'; else if avgf > 60.0 then synopsis = 'Mild'; else synopsis = 'Cold'; end; enddata; run; libname test "/xxxxx/xxxxxx/xxxxx"; data daily_temps; set gpdb.daily_temps; run; data test.test_time; length synopsis $5; set daily_temps; avgf = mean(mon, tue, wed, thu, fri); avgc = round((avgf - 32.0) * 5.0/9.0, .1); if avgf >= 95.0 then synopsis = 'Hot'; else if avgf > 80.0 then synopsis = 'Warm'; else if avgf > 60.0 then synopsis = 'Mild'; else synopsis = 'Cold'; run; While using PROC HPDS2, I have kept both source and destination tables in Greenplum to faciliate in database processing. Please refer the performance of HPDS2 below: NOTE: The HPDS2 procedure is executing in the distributed computing environment with 4 worker nodes. NOTE: The data set GPDB.AVG_TEMPS has 25 observations and 9 variables. NOTE: PROCEDURE HPDS2 used (Total process time): real time 10.75 seconds cpu time 3.08 seconds For the same arithmetics, the time taken by normal data step happens to be: NOTE: There were 25 observations read from the data set WORK.DAILY_TEMPS. NOTE: The data set TEST.TEST_TIME has 25 observations and 9 variables. NOTE: Compressing data set TEST.TEST_TIME increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds I am not sure if I am missing something while doing in database processing. Can someone help me understand what is going wrong and why a high performance procedure that is supposed to be fast is taking much longer times to complete. Thanks in advance for your valuable suggestions.
... View more