Hi,
Is it possible to create a data step that comes from a variable?
I have a table with ref, amount and tariff.
The total_cost must be calculated with the 'formula' provided in tariff, where the 'formula' also uses the amount in the calculation.
(This is just an example, where the 'formula' could be anything else and also use other variables).
I did try lots of things, like with macro variables, call execute, etc.., but no luck.
Thanks!
data tars;
input REF 1-8 AMOUNT 10-11 TARIFF $ 13-34;
datalines;
10655094 5 12+ifn(AMOUNT>=12,6,0)
10655234 10 16+ifn(AMOUNT>=10,4,0)
;
run;
data prep;
set tars;
CALC_STRING = cats('TOTAL_COST=',TARIFF,';');
* >> how to 'execute' the calc_string at this point? *;
run;
You cannot execute a variable string within the same data step; as soon as the data step starts executing, it is compiled and won't change.
You can create the whole data step code dynamically, either with CALL EXECUTE or writing to a temporary file and %INCLUDEing the temporary file.
A brute force approach could be this:
data _null_;
set tars end=done;
if _n_ = 1
then call execute("data prep; set tars;");
call execute ("
if _n_ = " !! put(_n_,32.) !!"
then do;
total_cost = " !! tariff !! ";
end;
");
if done then call execute ("run;");
run;
You can use your dataset to generate a data-step:
data _null_;
set tars end=jobDone;
if _n_ = 1 then do;
call execute('data want; set tars;');
end;
call execute(cats('total_costs=', tariff, ';'));
if jobDone then do;
call execute('run;');
end;
run;
Hi Andreas,
Thanks for the example.
But it calculates each observation with *all* total_costs lines.
And the total_costs is always based on that last line and not on the one provided with each observation.
I'm thinking of maybe adding 'if _n_ is X then ..' (?)
You cannot execute a variable string within the same data step; as soon as the data step starts executing, it is compiled and won't change.
You can create the whole data step code dynamically, either with CALL EXECUTE or writing to a temporary file and %INCLUDEing the temporary file.
A brute force approach could be this:
data _null_;
set tars end=done;
if _n_ = 1
then call execute("data prep; set tars;");
call execute ("
if _n_ = " !! put(_n_,32.) !!"
then do;
total_cost = " !! tariff !! ";
end;
");
if done then call execute ("run;");
run;
Hi Kurt,
Thanks!!!
This works perfectly.
And is in the line of what I was thinking in reply to Andreas.. (adding if _n_ ..)
Both thanks for the suggestions/solution!
Hi KurtBremser,
Why do you need if _n_ = " !! put(_n_,32.) !!"
in
call execute ("
if _n_ = " !! put(_n_,32.) !!"
then do;
total_cost = " !! tariff !! ";
end;
");
?
Wouldn't it be easier to just have:
call execute ("total_cost = " !! tariff !! ";");
?
Each observation potentially wants to run a different formula.
Try it yourself and see.
137 data prep1; 138 set tars; 139 %include code / source2; NOTE: %INCLUDE (level 1) file CODE is file C:\Users\...\#LN00058. 140 +if _N_=1 then total_cost=12+ifn(AMOUNT>=12,6,0) ; 141 +if _N_=2 then total_cost=16+ifn(AMOUNT>=10,4,0) ; NOTE: %INCLUDE (level 1) ending. 142 run; NOTE: There were 2 observations read from the data set WORK.TARS. NOTE: The data set WORK.PREP1 has 2 observations and 4 variables. 151 data prep2; 152 set tars; 153 %include code / source2; NOTE: %INCLUDE (level 1) file CODE is file C:\Users\...\#LN00059. 154 +total_cost=12+ifn(AMOUNT>=12,6,0) ; 155 +total_cost=16+ifn(AMOUNT>=10,4,0) ; NOTE: %INCLUDE (level 1) ending. 156 run; NOTE: There were 2 observations read from the data set WORK.TARS. NOTE: The data set WORK.PREP2 has 2 observations and 4 variables.
filename code temp;
data _null_;
set tars;
file code;
put 'if ' _n_= 'then total_cost=' tariff ';' ;
run;
data prep1;
set tars;
%include code / source2;
run;
filename code temp;
data _null_;
set tars;
file code;
put 'total_cost=' tariff ';' ;
run;
data prep2;
set tars;
%include code / source2;
run;
proc compare data=prep1 compare=prep2;
run;
The COMPARE Procedure Comparison of WORK.PREP1 with WORK.PREP2 (Method=EXACT) Values Comparison Summary Number of Variables Compared with All Observations Equal: 3. Number of Variables Compared with Some Observations Unequal: 1. Total Number of Values which Compare Unequal: 1. Maximum Difference: 4. Variables with Unequal Values Variable Type Len Ndif MaxDif total_cost NUM 8 1 4.000 Value Comparison Results for Variables __________________________________________________________ || Base Compare Obs || total_cos total_cos Diff. % Diff || t t ________ || _________ _________ _________ _________ || 1 || 12.0000 16.0000 4.0000 33.3333 __________________________________________________________
But that different formula is contained in the variable TARIFF which is read in with each iteration of the data step. What I suggest concurs with the solution provided by @andreas_lds.
Without the condition, we would have a series of unconditional assignments in the resulting data step, and only the last would take effect. In the end, all observations would have the results of the formula from the last observation.
Keep in mind that, for a dataset of n observations, we create a data step with n assignments, and the condition makes sure that each assignment is done only once, for its intended observation.
Thank you, KurtBremser for the explanation. You are correct and I was not. I still don't like the fact that the generated code looks like an automated hard-coding. However, I guess until we get something like Python's eval( ) function we are stuck with this method of "automated hard-coding", see also How to evaluate SAS expressions in DATA Step dynamically.
It is just normal data driven code generation. Although normally you would be using some other variable instead of the observation in the data set. For example some type of record type variable. In which case you only need to generate one statement per possible TYPE value instead of one per observation.
I wouldn't call it "hard coding". It is more of a parameter driven code. It is just that the parameter value are code snippets. But it does open you up for code injection attacks. So you really need to trust the source of that variable with the code and have the proper controls in place to insure the content is valid.
I also suggest you use @Tom's method of writing the code to a file and look at the resulting codes. Then you will see why I used the condition testing for _N_.
Either use the strings to generate code, see answer by @Kurt_Bremser ,
Or if your strings follow a strict pattern like your example just parse the strings to get the values and re-create the function call.
data prep;
set tars;
x1=input(scan(tariff,1,'+'),32.);
x2=input(scan(tariff,-3,',=)'),32.);
x3=input(scan(tariff,-2,',=)'),32.);
x4=input(scan(tariff,-1,',=)'),32.);
TOTAL_COST=x1+ifn(AMOUNT>=x2,x3,4x);
run;
Or convert the value of AMOUNT to a macro variable and convert your string to macro code that references the macro variable and use RESOLVE() function. (again only if the pattern of TARIFF is well understood and consistent).
data prep;
set tars;
call symputx('amount',amount);
length macro_code $200;
macro_code = cats('%sysevalf(',scan(tariff,1,'+'),'+%sysfunc(',scan(tariff,2,'+'),'))');
macro_code = tranwrd(macro_code,'AMOUNT','&AMOUNT');
total_cost=input(resolve(macro_code),32.);
run;
Hi Tom,
Thanks for those examples.
But TARIFF can be any 'formula' and referencing to any variable in the table.
I like the second example as I also tried to do it with macro variables.
Already see what I could have tried also. Thanks!
Note it is usually a LOT easier to use PUT to generate the code than CALL EXECUTE().
filename code temp;
data _null_;
set tars;
file code;
put 'if ' _n_= 'then total_cost=' tariff ';' ;
run;
data prep;
set tars;
%include code / source2;
run;
121 data prep; 122 set tars; 123 %include code / source2; NOTE: %INCLUDE (level 1) file CODE is file C:\Users\ABERNA~1\AppData\Local\Temp\1\SAS Temporary Files\_TD5508_AMRL20L6F1E4992_\#LN00057. 124 +if _N_=1 then total_cost=12+ifn(AMOUNT>=12,6,0) ; 125 +if _N_=2 then total_cost=16+ifn(AMOUNT>=10,4,0) ; NOTE: %INCLUDE (level 1) ending. 126 run; NOTE: There were 2 observations read from the data set WORK.TARS. NOTE: The data set WORK.PREP has 2 observations and 4 variables.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.