If your data are always structured such that a formula observation is immediately followed by all of its components, then the solution offered by @PeterClemmensen seems to be the most straightforward.
But if the components can be anywhere in the dataset, then you need to have access to all the known score values in the dataset before you can reliably interpret every formula. @Ksharp 's solution does that. The only downside I see to the latter is that, for each formula, it loops through all known scores in the dataset ("do i=1 to dim(_x)"). It could be a big loop. You said you have 250,000 obs, so maybe you have 50,000 formulas and 200,000 known scores to loop through for each formula, yes?.
Here's a way to have every known score available for each formula, but loop only through those scores actually in the formula:
data have;
length ID $10. score 8. formula $200.;
infile datalines dsd missover ;
input ID $ score formula $;
datalines;
KRI1,,KRI2/KRI3
KRI2,1,
KRI3,10,
KRI4,,(KRI5-KRI6)*100
KRI5,13,
KRI6,16,
KRI7,,(KRI8+KRI9)/(KRI10-KRI11)
KRI8,13,
KRI9,16,
KRI10,1,
KRI11,10,
run;
data want (drop=_: i);
set have (where=(formula is not missing));
if _n_=1 then do;
declare hash sc (dataset:'have (keep=id score where=(score is not missing))');
sc.definekey('id');
sc.definedata('score');
sc.definedone();
end;
resolve_text=compress(formula);
do i=1 to countw(formula,'-+*/() '); /* For every term in the formula */
_term=scan(formula,i,'-+*/() '); /* Get the term identity, then insert its score */
if sc.find(key:_term)=0 then resolve_text=tranwrd(resolve_text,trim(_term),trim(cats(score)));
end;
score=input(resolve(cats('%sysevalf(',resolve_text,')')),32.);
run;
Note that the sc.find(key:_term) method will retrieve the score from the SC hash, but not the ID. That's because the ID is only in the .definekey method, but not in the .definedata method.
... View more