Hi, first ever question so sorry if I'm posting on the wrong place or something like that. Please let me know so I can re-post accordingly or edit in any info I've missed. Version: SAS STUDIO | Release: 3.7 (Enterprise Edition) So, I've got a sort of meta dataset that looks something like this: data meta_ds;
input varname:$32. id missvalue cap99;
datalines;
var1 1 0 10
var1 2 3 24
var1 3 5 50
var2 1 1 5
var2 2 10 40
var2 3 4 20
;
run; Using this meta_ds I need to apply some transformations to another dataset depending on the id variable. Using the example meta_ds, these transformations would look something like this: if id = 1 then do;
if var1 = . then var1 = 0;
if var1 > 10 then var1 = 10;
if var2 = . then var2 = 1;
.
.
end;
else if id = 2 then do;
if var1 = . then var1 = 3;
.
.
end;
else if id = 3 then do;
.
.
end;
So basically: if var = . then var = missvalue;
if var > cap99 then var > cap99; The dataset which I need to apply the transformations to is fairly big (around 40million rows) and making this process as fast and efficient as possible is one of the major concerns. My question is: Is hardcoding all the if statements the best approach? I know how to generate the code using something like proc sql, which could then be executed with %include for example... but what if I didn't want to have to write all those lines of code (a bit above 100 variables, 3 unique ids, 2 if statements for each variable and id combination, so a bit over 1000 lines of code). One approach I tried was using the hash object: data want;
if _n_=0 then set meta_ds;
DECLARE hash lookUpTable(dataset:'meta_ds');
lookUpTable.DEFINEKEY ('varName','id');
lookUpTable.DEFINEDATA('missValue','cap99');
lookUpTable.DEFINEDONE();
array numvar &varlistContinuas.;
do until (eof);
set have end=eof;
do over numvar;
call vname(numvar,varName);
varName = upcase(varName);
rc = lookUpTable.FIND();
if rc = 0 then do;
if numvar = . then numvar = missValue;
if numvar > cap99 then numvar = cap99;
end;
else putlog "ERROR:LookUpTable" varname;
end;
output;
end;
drop varName missValue cap99 rc;
stop;
run; but this takes quite a bit longer to run (30 min vs <10 min). Is there a better approach? Added dummy have data: data have;
input id var1 var2 varn;
datalines;
1 3 4 15
1 . 15 .
2 . . 10
3 60 24 9
2 25 50 .
3 . . 2
;
run;
... View more