How to replace text from one column to another?

Azeem112

Hi , I have a table as below

data have;
length ID $10. score 8. formula $200.;
infile datalines dsd missover  ;
input ID $ score formula $;
datalines;
KRI1,,KRI2/KRI3
KRI2,1,
KRI3,10,
KRI4,,(KRI5-KRI6)*100
KRI5,13,
KRI6,16,
KRI7,,(KRI8+KRI9)/(KRI10-KRI11)
KRI8,13,
KRI9,16,
KRI10,1,
KRI11,10,
;
run;

I want to replace the formula ID's with actual values. i can filter out the formula data by using where the formula is not empty but my actual table has +250k rows and i can't do a standard loop and replace method.

I want my want table as

data want;
length ID $10. formula $200. score 8. ;
infile datalines dsd missover  ;
input ID $ formula $ score;
datalines;
KRI1,1/10,0.1
KRI4,(13-16)*100,-300
KRI7,(13+16)/(1-10),-3.22
;
run;

for a smaller data I am using loop and tranwrd function and then use eval method to calculate the final value.

How can i do the same for a large dataset?

PeterClemmensen

Try this. Should perform reasonably.

Feel free to ask 🙂

data temp;

   format ID formula score;

   merge have
         have (firstobs = 2 keep = score rename = score = s);

   if score = . then do;
      f = formula;
      i = ID;
   end;

   if score then f = tranwrd(f, strip(ID), put(strip(score), 8.));

   if s = . then do;
      formula = compress(f);
      ID = i;
      score = input(resolve('%sysevalf('||formula||')'), best8.);
      output;
   end;

   retain f i;
   keep ID formula score;
   format score2 8.2;
run;

Result:

ID    formula         score
KRI1  1/10            0.1
KRI4  (13-16)*100     -300
KRI7  (13+16)/(1-10)  -3.22222

The DATA to DATA Step Macro
Blog: SASnrd

AhmedAl_Attar

Hi @Azeem112

Stealing some code from @PeterClemmensen response 😉, here is another approach

data have;
length ID $10. score 8. formula $200.;
infile datalines dsd missover  ;
input ID $ score formula $;
datalines;
KRI1,,KRI2/KRI3
KRI2,1,
KRI3,10,
KRI4,,(KRI5-KRI6)*100
KRI5,13,
KRI6,16,
KRI7,,(KRI8+KRI9)/(KRI10-KRI11)
KRI8,13,
KRI9,16,
KRI10,1,
KRI11,10,
;
run;

/* Create custom informat that can be used to convert ID to matching numeric value */
data infmt;
  retain fmtname 'ref' type 'I';
  /* Exclude formula records */
  set have(where=(missing(formula)) rename=(id=start score=label)); 
run;
proc format cntlin=work.infmt; run;

/* Use the custom Informat with Regular Expression */
data want(KEEP=ID formula score);
  set have(where=(strip(formula) ne ''));
  length formula2 $400;
  formula2 = cats('%sysevalf(',prxchange('s/KRI(\d+)/%sysfunc(inputn(KRI$1,ref.))/',-1,formula),')');
  score = input(resolve(formula2),best.);
run;

Hope this helps

Ksharp

data have;
length ID $10. score 8. formula $200.;
infile datalines dsd missover  ;
input ID $ score formula $;
datalines;
KRI1,,KRI2/KRI3
KRI2,1,
KRI3,10,
KRI4,,(KRI5-KRI6)*100
KRI5,13,
KRI6,16,
KRI7,,(KRI8+KRI9)/(KRI10-KRI11)
KRI8,13,
KRI9,16,
KRI10,1,
KRI11,10,
;
run;
proc transpose data=have(where=(score is not missing)) out=temp(drop=_name_) prefix=_;
id id;
var score;
run;
data want;
 if _n_=1 then set temp;
 set have(where=(formula is not missing));
 array x{*} _:;
 do i=1 to dim(x);
  formula=tranwrd(upcase(formula),upcase(substr(vname(x{i}),2)),strip(x{i}) );
 end;
 score=input(resolve(catt('%sysevalf(',formula,')')),best.);
 drop i _:;
 run;

mkeintz

If your data are always structured such that a formula observation is immediately followed by all of its components, then the solution offered by @PeterClemmensen seems to be the most straightforward.

But if the components can be anywhere in the dataset, then you need to have access to all the known score values in the dataset before you can reliably interpret every formula. @Ksharp 's solution does that. The only downside I see to the latter is that, for each formula, it loops through all known scores in the dataset ("do i=1 to dim(_x)"). It could be a big loop. You said you have 250,000 obs, so maybe you have 50,000 formulas and 200,000 known scores to loop through for each formula, yes?.

Here's a way to have every known score available for each formula, but loop only through those scores actually in the formula:

data have;
  length ID $10. score 8. formula $200.;
  infile datalines dsd missover  ;
  input ID $ score formula $;
datalines;
KRI1,,KRI2/KRI3
KRI2,1,
KRI3,10,
KRI4,,(KRI5-KRI6)*100
KRI5,13,
KRI6,16,
KRI7,,(KRI8+KRI9)/(KRI10-KRI11)
KRI8,13,
KRI9,16,
KRI10,1,
KRI11,10,
run;

data want (drop=_: i);
  set have (where=(formula is not missing));
  if _n_=1 then do;
    declare hash sc (dataset:'have (keep=id score  where=(score is not missing))');
      sc.definekey('id');
      sc.definedata('score');
      sc.definedone();
  end;

  resolve_text=compress(formula);

  do i=1 to countw(formula,'-+*/() ');  /* For every term in the formula */
    _term=scan(formula,i,'-+*/() ');    /* Get the term identity, then insert its score */
    if sc.find(key:_term)=0 then resolve_text=tranwrd(resolve_text,trim(_term),trim(cats(score)));
  end;
  score=input(resolve(cats('%sysevalf(',resolve_text,')')),32.);
run;

Note that the sc.find(key:_term) method will retrieve the score from the SC hash, but not the ID. That's because the ID is only in the .definekey method, but not in the .definedata method.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Tom

Are your ID values valid SAS names?

Transpose your non formula values to a single observation. First see if PROC TRANSPOSE can handle it.

proc transpose
  data=have(where=(missing(formula)))
  out=wide(drop=_name_ )
;
  id id;
  var score;
run;

If it cannot you can generate CODE to make the WIDE dataset instead.

filename code temp;
data _null_;
  file code;
  set have;
  where not missing(score);
  put id '=' score ';' ;
run;
data wide;
  %include code ;
run;

Now use the other observations to generate code to set the SCORE (and ID and FORMUAL variables). And include the generated code to calculate the score and write the observations.

filename code temp;
data _null_;
  file code;
  set have;
  where not missing(formula);
  put id=:$quote. ';' formula= :$quote. ';score=' formula ';output;' ;
run;

Now read in the WIDE dataset to get the SCORE values for the IDs in question. Also read in the HAVE dataset (but not any of the data) to set the type/length/format of the variables.

data want; 
  set have(obs=0) wide;
  %inc code ;
  keep id score formula; 
run;

You could generate bot code files in a single pass through the HAVE dataset. So two passes total. See below. Remove the WHERE statement if you want the result to have all of the original observations.

Spoiler

filename code1 temp;
filename code2 temp;
data _null_;
  set have;
  if missing(formula) then do;
     file code1;
     put id '=' score ';' ;
  end;
  else do;
     file code2 ;
     put 'when (' id :$quote. ') score=' formula ';' ;
  end;
run;

data wide ;
  %include code1 ;
run;

data want; 
  if _n_=1 then set wide ;
  set have;
  where not missing(formula);
  select (id);
    %inc code2 ;
     otherwise ;
  end;
  keep id score formula ;
run;

filename code1 temp; filename code2 temp; data _null_; set have; if missing(formula) then do; file code1; put id '=' score ';' ; end; else do; file code2 ; put 'when (' id :$quote. ') score=' formula ';' ; end; run; data wide ; %include code1 ; run; data want; if _n_=1 then set wide ; set have; where not missing(formula); select (id); %inc code2 ; otherwise ; end; keep id score formula ; run;

How to replace text from one column to another?

Re: How to replace text from one column to another?

Re: How to replace text from one column to another?

Re: How to replace text from one column to another?

Re: How to replace text from one column to another?

Re: How to replace text from one column to another?

How to replace text from one column to another?

Re: How to replace text from one column to another?

Re: How to replace text from one column to another?

Re: How to replace text from one column to another?

Re: How to replace text from one column to another?

Re: How to replace text from one column to another?

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away