BookmarkSubscribeRSS Feed
Edoedoedo
Pyrite | Level 9

Hi,

 

I have a dataset like this:

 

A_FIELD | ANOTHER_FIELD | OTHER_FIELD | CONDITION

1 | 2 | 3 | A_FIELD + ANOTHER_FIELD = OTHER_FIELD

0 | 1 | 0 | 0 <= A_FIELD * 10 + OTHER_FIELD <= ANOTHER_FIELD

 

I need to scan this dataset in a datastep and append a column CHECK which contains 1 or 0 depending whether CONDITION, evaluated on each row, is true or false. Eg:

 

A_FIELD | ANOTHER_FIELD | OTHER_FIELD | CONDITION | CHECK

1 | 2 | 3 | A_FIELD + ANOTHER_FIELD = OTHER_FIELD | 1

0 | 1 | 0 | 0 <= A_FIELD + 10 + OTHER_FIELD <= ANOTHER_FIELD | 0

 

Such a table has about 100k rows, and CONDITION is almost always different on each row.

Moreover, each field can be string or number, and each field has a custom name, so there is no standard way to identify fields like field1, field2, ...

 

Is there a way to "evaluate" a string using variables of each row? I'd do something like:

 

data want;

set have;

CHECK = eval(CONDITION);

run;

 

How can it be done? I thought about doing a call execute for each row, but 100k executes crash the system.

 

Thanks a lot.

PS: for whom it may be interested, it is a data quality check: each row has some value in each field (string or number), and CONDITION is an expression about that row which when is evaluated true flags that row as an anomaly.

11 REPLIES 11
kannand
Lapis Lazuli | Level 10

You an have the 100k dataset split different record types which will require different CALCULATIONS and process each set with the type of calculation and append all sets back together at the end back into one dataset.  This is a simple and easiest I can think of.

 

For example, if you know that there are 10 different types of Evaluate expressions and Calculations required with the append, the 100k would be split into 10 different datasets and processed separately before concatenating them at the end.

 

Hope this makes sense.

Kannan Deivasigamani
Reeza
Super User
Use the data set to write a program and have that run instead.
Haikuo
Onyx | Level 15

If you have SAS 9.4, then DOSUBL() may fit your bill.

 

data have;
	infile cards truncover;
	input a b c condition $100.;
	_n_=dosubl('data _null_; call symputx("check",'||condition ||'); run;');
	check=symget('check');
	cards;
 1 2 3 a+b=c
 0 1 0 0<=a+10+c<=b 
 ;
data_null__
Jade | Level 19

Do A B and C have values when CONDITION is evaluated in the DATA _NULL_ run by DOSUBL?

Haikuo
Onyx | Level 15

@data_null__ wrote:

Do A B and C have values when CONDITION is evaluated in the DATA _NULL_ run by DOSUBL?


I believe so.

data_null__
Jade | Level 19

How about now? 🙂

 

 

25         	infile cards truncover;
26         	input a b c condition $100.;
27         	_n_=dosubl('data _null_; call symputx("check",'||condition ||'); put _all_; run;');
28         	check=symget('check');
29         	cards;
NOTE: Variable b is uninitialized.
NOTE: Variable a is uninitialized.
NOTE: Variable c is uninitialized.
b=. a=. c=. _ERROR_=0 _N_=1
NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 0:-36   
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      
NOTE: Variable a is uninitialized.
NOTE: Variable c is uninitialized.
NOTE: Variable b is uninitialized.
a=. c=. b=. _ERROR_=0 _N_=1
NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 0:-42   1 at 0:-39   
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
Haikuo
Onyx | Level 15

You are of course correct, John! Here is the update:

 

data have;
	infile cards truncover;
	input a b c condition $100.;
	call symputx('a',a);
	call symputx('b',b);
	call symputx('c',c);
	_n_=dosubl('data _null_; a=symget("a"); b=symget("b"); c=symget("c"); call symputx("check",'||condition ||'); put a= b= c=; run;');
	check=symget('check');
	cards;
 1 2 3 a+b=c
 0 1 0 0<=a+10+c<=b 
 125 5 3 a=b**c
 ;
Haikuo
Onyx | Level 15

In the previous post I have used macro variable streneously to pass values to DOSUBL(). I don't know what I was thinking(brain fart?), I can just pass on values as I did on 'condition', Dah.

 

 data have;
	infile cards truncover;
	input a b c condition $100.;
	_n_=dosubl('data _null_; a='||a||'; b='||b||'; c='||c||'; call symputx("check",'||condition ||'); put a= b= c=; run;');
	check=symget('check');
	cards;
 1 2 3 a+b=c
 0 1 0 0<=a*10+c<=b 
 125 5 3 a<b*c
 ;
data_null__
Jade | Level 19

You could use DOSUBL but I don't know about performance with 100k obs.  Consider a walpaper approach you can break up the 100k into mangable groups.

 

data have;
   infile cards dsd dlm='|' firstobs=2;
   input A_FIELD  ANOTHER_FIELD OTHER_FIELD CONDITION :$128.;
   cards;
A_FIELD | ANOTHER_FIELD | OTHER_FIELD | CONDITION
1 | 2 | 3 | A_FIELD + ANOTHER_FIELD = OTHER_FIELD
0 | 1 | 0 | 0 <= A_FIELD * 10 + OTHER_FIELD <= ANOTHER_FIELD
0 | 1 | 2 | 0 <= A_FIELD * 10 + OTHER_FIELD <= ANOTHER_FIELD
1 | 2 | 3 | A_FIELD + ANOTHER_FIELD = OTHER_FIELD
0 | 1 | 0 | 0 <= A_FIELD * 10 + OTHER_FIELD <= ANOTHER_FIELD
0 | 1 | 2 | 0 <= A_FIELD * 10 + OTHER_FIELD <= ANOTHER_FIELD
1 | 2 | 3 | A_FIELD + ANOTHER_FIELD = OTHER_FIELD
0 | 1 | 0 | 0 <= A_FIELD * 10 + OTHER_FIELD <= ANOTHER_FIELD
0 | 1 | 2 | 0 <= A_FIELD * 10 + OTHER_FIELD <= ANOTHER_FIELD
;;;;
   run;
proc print;
   run;
data haveV / view=haveV;
   set have;
   if mod(_n_,3) eq 1 then g+1;
   run;
filename FT43F001 temp;
data _null_;
   file FT43F001;
   set haveV;
   by g;
   if first.g then do;
      put 'data g' g ';';
      put ' if 0 then set have;';
      end; 
   put +3 (A_FIELD ANOTHER_FIELD OTHER_FIELD condition) (';' =) ';';
   put +3 'check = (' condition :$128.  '); output;';
   if last.g then put +3 'Stop;' / +3 'run;';
   format condition $quote.;
   run;
%include FT43F001 / source2;
   run;
data check;
   set g:;
   run;
proc print;
   run;

Capture.PNG

PGStats
Opal | Level 21

Interesting challenge. This should do it:

 

%let nbVars=3;

data conditions;
infile datalines dlm="|";
input (v1-v&nbVars) (:$20.) condition :$64.;
datalines;
A_FIELD | ANOTHER_FIELD | OTHER_FIELD | CONDITION
1 | 2 | 3 | A_FIELD + ANOTHER_FIELD = OTHER_FIELD
0 | 1 | 0 | 0 <= A_FIELD + 10 + OTHER_FIELD <= ANOTHER_FIELD
;

filename check temp;

data _null_;
array varName{&nbVars} $20;
retain varname;
array v{&nbVars} $20;
set conditions;

file check;
if _n_ = 1 then
    do i = 1 to &nbVars;
        varName{i} = v{i};
        end;
else do;
    do i = 1 to &nbVars;
        put varName{i} "=" v{i} ";";
        end;
    put "CONDITION=""" condition """;";
    put "CHECK=" condition "; output;";
    end;
run;

data check;
length condition $64;
%include check;
run;

filename check;
PG
ballardw
Super User

I think you might want to post some example of the data and the rules. I think you possibly have abstracted it a bit too far.

 

I would likely check for "conditions" that are similar in nature and look at array processing.

 

And I would STRONGLY recommend creating a new dataset instead of  using a

data have;

    set have;

    <various data manglers here>

run;

 

to avoid accidently loosing your starting data.

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3694 views
  • 4 likes
  • 7 in conversation