DATA Step, Macro, Functions and more

Evaluate string expression within a dataset

Reply
Contributor
Posts: 43

Evaluate string expression within a dataset

[ Edited ]

Hi,

 

I have a dataset like this:

 

A_FIELD | ANOTHER_FIELD | OTHER_FIELD | CONDITION

1 | 2 | 3 | A_FIELD + ANOTHER_FIELD = OTHER_FIELD

0 | 1 | 0 | 0 <= A_FIELD * 10 + OTHER_FIELD <= ANOTHER_FIELD

 

I need to scan this dataset in a datastep and append a column CHECK which contains 1 or 0 depending whether CONDITION, evaluated on each row, is true or false. Eg:

 

A_FIELD | ANOTHER_FIELD | OTHER_FIELD | CONDITION | CHECK

1 | 2 | 3 | A_FIELD + ANOTHER_FIELD = OTHER_FIELD | 1

0 | 1 | 0 | 0 <= A_FIELD + 10 + OTHER_FIELD <= ANOTHER_FIELD | 0

 

Such a table has about 100k rows, and CONDITION is almost always different on each row.

Moreover, each field can be string or number, and each field has a custom name, so there is no standard way to identify fields like field1, field2, ...

 

Is there a way to "evaluate" a string using variables of each row? I'd do something like:

 

data want;

set have;

CHECK = eval(CONDITION);

run;

 

How can it be done? I thought about doing a call execute for each row, but 100k executes crash the system.

 

Thanks a lot.

PS: for whom it may be interested, it is a data quality check: each row has some value in each field (string or number), and CONDITION is an expression about that row which when is evaluated true flags that row as an anomaly.

Regular Contributor
Posts: 161

Re: Evaluate string expression within a dataset

You an have the 100k dataset split different record types which will require different CALCULATIONS and process each set with the type of calculation and append all sets back together at the end back into one dataset.  This is a simple and easiest I can think of.

 

For example, if you know that there are 10 different types of Evaluate expressions and Calculations required with the append, the 100k would be split into 10 different datasets and processed separately before concatenating them at the end.

 

Hope this makes sense.

Kannan Deivasigamani
Super User
Posts: 17,819

Re: Evaluate string expression within a dataset

Use the data set to write a program and have that run instead.
Respected Advisor
Posts: 3,124

Re: Evaluate string expression within a dataset

If you have SAS 9.4, then DOSUBL() may fit your bill.

 

data have;
	infile cards truncover;
	input a b c condition $100.;
	_n_=dosubl('data _null_; call symputx("check",'||condition ||'); run;');
	check=symget('check');
	cards;
 1 2 3 a+b=c
 0 1 0 0<=a+10+c<=b 
 ;
Respected Advisor
Posts: 3,777

Re: Evaluate string expression within a dataset

Do A B and C have values when CONDITION is evaluated in the DATA _NULL_ run by DOSUBL?

Respected Advisor
Posts: 3,124

Re: Evaluate string expression within a dataset


data_null__ wrote:

Do A B and C have values when CONDITION is evaluated in the DATA _NULL_ run by DOSUBL?


I believe so.

Respected Advisor
Posts: 3,777

Re: Evaluate string expression within a dataset

How about now? :-)

 

 

25         	infile cards truncover;
26         	input a b c condition $100.;
27         	_n_=dosubl('data _null_; call symputx("check",'||condition ||'); put _all_; run;');
28         	check=symget('check');
29         	cards;
NOTE: Variable b is uninitialized.
NOTE: Variable a is uninitialized.
NOTE: Variable c is uninitialized.
b=. a=. c=. _ERROR_=0 _N_=1
NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 0:-36   
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      
NOTE: Variable a is uninitialized.
NOTE: Variable c is uninitialized.
NOTE: Variable b is uninitialized.
a=. c=. b=. _ERROR_=0 _N_=1
NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 0:-42   1 at 0:-39   
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
Respected Advisor
Posts: 3,124

Re: Evaluate string expression within a dataset

You are of course correct, John! Here is the update:

 

data have;
	infile cards truncover;
	input a b c condition $100.;
	call symputx('a',a);
	call symputx('b',b);
	call symputx('c',c);
	_n_=dosubl('data _null_; a=symget("a"); b=symget("b"); c=symget("c"); call symputx("check",'||condition ||'); put a= b= c=; run;');
	check=symget('check');
	cards;
 1 2 3 a+b=c
 0 1 0 0<=a+10+c<=b 
 125 5 3 a=b**c
 ;
Respected Advisor
Posts: 3,124

Re: Evaluate string expression within a dataset

In the previous post I have used macro variable streneously to pass values to DOSUBL(). I don't know what I was thinking(brain fart?), I can just pass on values as I did on 'condition', Dah.

 

 data have;
	infile cards truncover;
	input a b c condition $100.;
	_n_=dosubl('data _null_; a='||a||'; b='||b||'; c='||c||'; call symputx("check",'||condition ||'); put a= b= c=; run;');
	check=symget('check');
	cards;
 1 2 3 a+b=c
 0 1 0 0<=a*10+c<=b 
 125 5 3 a<b*c
 ;
Respected Advisor
Posts: 3,777

Re: Evaluate string expression within a dataset

[ Edited ]

You could use DOSUBL but I don't know about performance with 100k obs.  Consider a walpaper approach you can break up the 100k into mangable groups.

 

data have;
   infile cards dsd dlm='|' firstobs=2;
   input A_FIELD  ANOTHER_FIELD OTHER_FIELD CONDITION :$128.;
   cards;
A_FIELD | ANOTHER_FIELD | OTHER_FIELD | CONDITION
1 | 2 | 3 | A_FIELD + ANOTHER_FIELD = OTHER_FIELD
0 | 1 | 0 | 0 <= A_FIELD * 10 + OTHER_FIELD <= ANOTHER_FIELD
0 | 1 | 2 | 0 <= A_FIELD * 10 + OTHER_FIELD <= ANOTHER_FIELD
1 | 2 | 3 | A_FIELD + ANOTHER_FIELD = OTHER_FIELD
0 | 1 | 0 | 0 <= A_FIELD * 10 + OTHER_FIELD <= ANOTHER_FIELD
0 | 1 | 2 | 0 <= A_FIELD * 10 + OTHER_FIELD <= ANOTHER_FIELD
1 | 2 | 3 | A_FIELD + ANOTHER_FIELD = OTHER_FIELD
0 | 1 | 0 | 0 <= A_FIELD * 10 + OTHER_FIELD <= ANOTHER_FIELD
0 | 1 | 2 | 0 <= A_FIELD * 10 + OTHER_FIELD <= ANOTHER_FIELD
;;;;
   run;
proc print;
   run;
data haveV / view=haveV;
   set have;
   if mod(_n_,3) eq 1 then g+1;
   run;
filename FT43F001 temp;
data _null_;
   file FT43F001;
   set haveV;
   by g;
   if first.g then do;
      put 'data g' g ';';
      put ' if 0 then set have;';
      end; 
   put +3 (A_FIELD ANOTHER_FIELD OTHER_FIELD condition) (';' =) ';';
   put +3 'check = (' condition :$128.  '); output;';
   if last.g then put +3 'Stop;' / +3 'run;';
   format condition $quote.;
   run;
%include FT43F001 / source2;
   run;
data check;
   set g:;
   run;
proc print;
   run;

Capture.PNG

Respected Advisor
Posts: 4,646

Re: Evaluate string expression within a dataset

Interesting challenge. This should do it:

 

%let nbVars=3;

data conditions;
infile datalines dlm="|";
input (v1-v&nbVars) (:$20.) condition :$64.;
datalines;
A_FIELD | ANOTHER_FIELD | OTHER_FIELD | CONDITION
1 | 2 | 3 | A_FIELD + ANOTHER_FIELD = OTHER_FIELD
0 | 1 | 0 | 0 <= A_FIELD + 10 + OTHER_FIELD <= ANOTHER_FIELD
;

filename check temp;

data _null_;
array varName{&nbVars} $20;
retain varname;
array v{&nbVars} $20;
set conditions;

file check;
if _n_ = 1 then
    do i = 1 to &nbVars;
        varName{i} = v{i};
        end;
else do;
    do i = 1 to &nbVars;
        put varName{i} "=" v{i} ";";
        end;
    put "CONDITION=""" condition """;";
    put "CHECK=" condition "; output;";
    end;
run;

data check;
length condition $64;
%include check;
run;

filename check;
PG
Super User
Posts: 10,497

Re: Evaluate string expression within a dataset

I think you might want to post some example of the data and the rules. I think you possibly have abstracted it a bit too far.

 

I would likely check for "conditions" that are similar in nature and look at array processing.

 

And I would STRONGLY recommend creating a new dataset instead of  using a

data have;

    set have;

    <various data manglers here>

run;

 

to avoid accidently loosing your starting data.

 

 

Ask a Question
Discussion stats
  • 11 replies
  • 486 views
  • 4 likes
  • 7 in conversation