BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Dietz
Calcite | Level 5

Hi,

 

Is it possible to create a data step that comes from a variable?

I have a table with ref, amount and tariff.

The total_cost must be calculated with the 'formula' provided in tariff, where the 'formula' also uses the amount in the calculation.

(This is just an example, where the 'formula' could be anything else and also use other variables).

I did try lots of things, like with macro variables, call execute, etc.., but no luck.

Thanks!

 

data tars;
	input REF 1-8 AMOUNT 10-11 TARIFF $ 13-34;
	datalines;
10655094 5  12+ifn(AMOUNT>=12,6,0)
10655234 10 16+ifn(AMOUNT>=10,4,0)
;
run;
data prep;
	set tars;
	CALC_STRING = cats('TOTAL_COST=',TARIFF,';');
	* >> how to 'execute' the calc_string at this point? *;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

You cannot execute a variable string within the same data step; as soon as the data step starts executing, it is compiled and won't change.

You can create the whole data step code dynamically, either with CALL EXECUTE or writing to a temporary file and %INCLUDEing the temporary file.

 

A brute force approach could be this:

data _null_;
set tars end=done;
if _n_ = 1
then call execute("data prep; set tars;");
call execute ("
  if _n_ = " !! put(_n_,32.) !!"
  then do;
    total_cost = " !! tariff !! ";
  end;
");
if done then call execute ("run;");
run;

View solution in original post

14 REPLIES 14
andreas_lds
Jade | Level 19

You can use your dataset to generate a data-step:

data _null_;
   set tars end=jobDone;
   
   if _n_ = 1 then do;
      call execute('data want; set tars;');
   end;
   
   call execute(cats('total_costs=', tariff, ';'));
   
   if jobDone then do;
      call execute('run;');
   end;
run;
Dietz
Calcite | Level 5

Hi Andreas,

Thanks for the example.

But it calculates each observation with *all* total_costs lines.

And the total_costs is always based on that last line and not on the one provided with each observation.

I'm thinking of maybe adding 'if _n_ is X then ..' (?)

Kurt_Bremser
Super User

You cannot execute a variable string within the same data step; as soon as the data step starts executing, it is compiled and won't change.

You can create the whole data step code dynamically, either with CALL EXECUTE or writing to a temporary file and %INCLUDEing the temporary file.

 

A brute force approach could be this:

data _null_;
set tars end=done;
if _n_ = 1
then call execute("data prep; set tars;");
call execute ("
  if _n_ = " !! put(_n_,32.) !!"
  then do;
    total_cost = " !! tariff !! ";
  end;
");
if done then call execute ("run;");
run;
Dietz
Calcite | Level 5

Hi Kurt,

 

Thanks!!!

This works perfectly.

And is in the line of what I was thinking in reply to Andreas.. (adding if _n_ ..)

 

Both thanks for the suggestions/solution!

LeonidBatkhan
Lapis Lazuli | Level 10

Hi KurtBremser,

 

Why do you need if _n_ = " !! put(_n_,32.) !!" in

call execute ("
  if _n_ = " !! put(_n_,32.) !!"
  then do;
    total_cost = " !! tariff !! ";
  end;
");

?

Wouldn't it be easier to just have:

call execute ("total_cost = " !! tariff !! ";");

?

 

Tom
Super User Tom
Super User

Each observation potentially wants to run a different formula.

Try it yourself and see.

137   data prep1;
138     set tars;
139   %include code / source2;
NOTE: %INCLUDE (level 1) file CODE is file C:\Users\...\#LN00058.
140  +if _N_=1 then total_cost=12+ifn(AMOUNT>=12,6,0) ;
141  +if _N_=2 then total_cost=16+ifn(AMOUNT>=10,4,0) ;
NOTE: %INCLUDE (level 1) ending.
142   run;

NOTE: There were 2 observations read from the data set WORK.TARS.
NOTE: The data set WORK.PREP1 has 2 observations and 4 variables.


151   data prep2;
152     set tars;
153   %include code / source2;
NOTE: %INCLUDE (level 1) file CODE is file C:\Users\...\#LN00059.
154  +total_cost=12+ifn(AMOUNT>=12,6,0) ;
155  +total_cost=16+ifn(AMOUNT>=10,4,0) ;
NOTE: %INCLUDE (level 1) ending.
156   run;

NOTE: There were 2 observations read from the data set WORK.TARS.
NOTE: The data set WORK.PREP2 has 2 observations and 4 variables.

 

Spoiler

 

filename code temp;
data _null_;
  set tars;
  file code;
  put 'if ' _n_= 'then total_cost=' tariff ';' ;
run;

data prep1;
  set tars;
%include code / source2;
run;

filename code temp;
data _null_;
  set tars;
  file code;
  put 'total_cost=' tariff ';' ;
run;

data prep2;
  set tars;
%include code / source2;
run;

proc compare data=prep1 compare=prep2;
run;
The COMPARE Procedure
Comparison of WORK.PREP1 with WORK.PREP2
(Method=EXACT)

Values Comparison Summary

Number of Variables Compared with All Observations Equal: 3.
Number of Variables Compared with Some Observations Unequal: 1.
Total Number of Values which Compare Unequal: 1.
Maximum Difference: 4.


Variables with Unequal Values

Variable    Type  Len  Ndif   MaxDif

total_cost  NUM     8     1    4.000



Value Comparison Results for Variables

__________________________________________________________
           ||       Base    Compare
       Obs ||  total_cos  total_cos      Diff.     % Diff
           ||          t          t
 ________  ||  _________  _________  _________  _________
           ||
        1  ||    12.0000    16.0000     4.0000    33.3333
__________________________________________________________

 

 

LeonidBatkhan
Lapis Lazuli | Level 10

But that different formula is contained in the variable TARIFF which is read in with each iteration of the data step. What I suggest concurs with the solution provided by @andreas_lds.

Kurt_Bremser
Super User

Without the condition, we would have a series of unconditional assignments in the resulting data step, and only the last would take effect. In the end, all observations would have the results of the formula from the last observation.

Keep in mind that, for a dataset of n observations, we create a data step with n assignments, and the condition makes sure that each assignment is done only once, for its intended observation.

LeonidBatkhan
Lapis Lazuli | Level 10

Thank you, KurtBremser for the explanation. You are correct and I was not. I still don't like the fact that the generated code looks like an automated hard-coding. However, I guess until we get something like Python's eval( ) function we are stuck with this method of "automated hard-coding", see also How to evaluate SAS expressions in DATA Step dynamically.

Tom
Super User Tom
Super User

It is just normal data driven code generation.  Although normally you would be using some other variable instead of the observation in the data set.  For example some type of record type variable.  In which case you only need to generate one statement per possible TYPE value instead of one per observation.

 

I wouldn't call it "hard coding".  It is more of a parameter driven code.  It is just that the parameter value are code snippets.  But it does open you up for code injection attacks.  So you really need to trust the source of that variable with the code and have the proper controls in place to insure the content is valid.

 

Tom
Super User Tom
Super User

Either use the strings to generate code, see answer by @Kurt_Bremser ,

Or if your strings follow a strict pattern like your example just parse the strings to get the values and re-create the function call.

data prep;
  set tars;
  x1=input(scan(tariff,1,'+'),32.);
  x2=input(scan(tariff,-3,',=)'),32.);
  x3=input(scan(tariff,-2,',=)'),32.);
  x4=input(scan(tariff,-1,',=)'),32.);
  TOTAL_COST=x1+ifn(AMOUNT>=x2,x3,4x);
run;

Or convert the value of AMOUNT to a macro variable and convert your string to macro code that references the macro variable and use RESOLVE() function. (again only if the pattern of TARIFF is well understood and consistent).

data prep;
  set tars;
  call symputx('amount',amount);
  length macro_code $200;
  macro_code = cats('%sysevalf(',scan(tariff,1,'+'),'+%sysfunc(',scan(tariff,2,'+'),'))');
  macro_code = tranwrd(macro_code,'AMOUNT','&AMOUNT');
  total_cost=input(resolve(macro_code),32.);
run;

 

Dietz
Calcite | Level 5

Hi Tom,

 

Thanks for those examples.

But TARIFF can be any 'formula' and referencing to any variable in the table.

 

I like the second example as I also tried to do it with macro variables.

Already see what I could have tried also. Thanks!

Tom
Super User Tom
Super User

Note it is usually a LOT easier to use PUT to generate the code than CALL EXECUTE().

filename code temp;
data _null_;
  set tars;
  file code;
  put 'if ' _n_= 'then total_cost=' tariff ';' ;
run;

data prep;
  set tars;
%include code / source2;
run;
121   data prep;
122     set tars;
123   %include code / source2;
NOTE: %INCLUDE (level 1) file CODE is file C:\Users\ABERNA~1\AppData\Local\Temp\1\SAS Temporary
      Files\_TD5508_AMRL20L6F1E4992_\#LN00057.
124  +if _N_=1 then total_cost=12+ifn(AMOUNT>=12,6,0) ;
125  +if _N_=2 then total_cost=16+ifn(AMOUNT>=10,4,0) ;
NOTE: %INCLUDE (level 1) ending.
126   run;

NOTE: There were 2 observations read from the data set WORK.TARS.
NOTE: The data set WORK.PREP has 2 observations and 4 variables.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 964 views
  • 2 likes
  • 5 in conversation