@yabwon this is really nice. Great to see application of CALL VNEXT and VVALUEX
@yabwon wrote:
Just for fun:
%let size=100;
data have;
array noMissC[&size.] $1 (&size.*"A");
array noMissN[&size.] (&size.*1) ;
array MissC[&size.] $1 ;
array MissN[&size.] ;
do i = 1 to 5e6;
output;
end;
run;
/*
proc print;
run;
*/
options ls=max;
%macro anyNonMissingData(ds);
proc format;
value missing
._-.z = " "
other = "*"
;
value $missing
" " = " "
other = "*"
;
run;
data _null_;
if 0 then set &ds.;
format _numeric_ missing. _character_ $missing.;
length _NAME_ $ 32 _TYPE_ $ 1;
declare hash _H_();
_H_.defineKey("_NAME_");
_H_.DefineDONE();
do until(_E_);
set &ds. end=_E_;
do until(_NAME_='_NAME_');
if _NAME_ NE '_NAME_' then
do;
call vnext(_NAME_, _TYPE_, _N_);
if NOT cmiss(vvaluex(_NAME_)) then rc=_H_.add();
end;
end;
end;
_H_.output(dataset:"work.onlyMissingData(where=((_NAME_ NE '_NAME_')))");
stop;
run;
%mend anyNonMissingData;
%anyNonMissingData(have)
proc print data = onlyMissingData;
run;
Have is 8.47GB, log says:
NOTE: The data set WORK.ONLYMISSINGDATA has 201 observations and 1 variables.
NOTE: There were 5000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 1.45 seconds
user cpu time 0.28 seconds
system cpu time 1.17 seconds
memory 1861.31k
OS Memory 26652.00k
I'd say, not bad (both time and memory), but still 8GB is not a big data.
But we can't do to much about that "all missing" search, its always O(n*k) time process(n=#of obs, k=#of vars).
Bart
... View more