@Kurt_Bremser:
Both are fine, and the extra step for extra performance is more than well justified - after all, the performance penalty of getting stuff from the dictionary tables is next to nil.
The only problem is, due to the nature of the SQL's "no concept of row sequence", it can't give you the cumulatives. If I were tasked with doing the whole enchilada in a single step, I'd most likely resort to a hash. For example:
data have ;
input gender $ name $ age ;
cards ;
M Alfred 14
F Barbara 13
F Carol 14
F Jane 12
F Janet 15
M Jeffrey 13
M John 12
F Judy 14
F Mary 15
M Robert 12
;
run ;
data want ;
dcl hash f () ;
f.definekey ("gender") ;
f.definedata ("gender", "count") ;
f.definedone () ;
dcl hiter fi ("f") ;
dcl hash q () ;
q.definekey ("gender") ;
q.definedata ("count", "cum_freq", "cum_pct") ;
q.definedone () ;
do until (z1) ;
set have end = z1 nobs = n ;
if f.find() ne 0 then count = 1 ;
else count + 1 ;
f.replace() ;
end ;
do while (fi.next() = 0) ;
pct = 100 * divide (count, n) ;
cum_freq + count ;
cum_pct + pct ;
q.add() ;
end ;
do until (z2) ;
set have end = z2 ;
q.find() ;
output ;
end ;
run ;
However, I have to admit that though it takes a hash man but a few minutes to code it, in the real world I'd rather stick to FREQ and backward merge, especially since in the case of gender, the FREQ output data set has only 2 records, anyway. Plus, FREQ gives me everything I need basically just by aping rather than programming, so why bother? Not to mention those going on to cannibalize on my code who aren't exactly hash men.
Kind regards
Paul D.
... View more