I've written the code for data validation for one dataset. I would like to develop further for multiple datasets using macro. Now the problem is that the rules which I want to write is not applicable for all the datasets. Because the variables for each dataset is not same.
Now I want to develop the below code to handle all the rules for multiple datasets. I want to feed the dataset name and variable names as keyword parameter macro.
data have;
infile cards dsd dlm='|' truncover;
input FLT_LAYR_NM :$10. FYP_NM :$30. _TEAM_NM :$30. ;
cards;
DIS|Consistencycheck|Business
DID|Rangecheck|Client
DID| |Client
DID|Rangecheck|Employee
;
data want;
length Error_id 8 Error_record $200;
keep Error_id Error_record ;
set have ;
Error_record=cats(FLT_LAYR_NM,'|',FYP_NM,'|',_TEAM_NM);
Error_id=1001;
if missing(FYP_NM) then output;
Error_id=1002;
if _TEAM_NM not in ('Business','Employee') then output;
run;
SAS code or something similar which I except:
But the challenge here is I want to skip the condition (or data validation) in case if the variable not exists in the calling macro.
%macro data_validation(table_name,variable1,variable2,variable3,variable4)
data have;
infile cards dsd dlm='|' truncover;
input FLT_LAYR_NM :$10. FYP_NM :$30. _TEAM_NM :$30. ;
cards;
DIS|Consistencycheck|Business
DID|Rangecheck|Client
DID| |Client
DID|Rangecheck|Employee
;
data want;
length Error_id 8 Error_record $200;
keep Error_id Error_record ;
set &table_name;
Error_record=cats(FLT_LAYR_NM,'|',FYP_NM,'|',_TEAM_NM);
/*data validation rule for variable 1*/
Error_id=1001;
if missing(&variable1) then output;
/*data validation rule for variable 2*/
Error_id=1002;
if &variable2 not in ('Business','Employee') then output;
run;
/*data validation rule for variable 3*/
Error_id=1003;
if &variable3 not in ('Bus','Train') then output;
run;
/*data validation rule for variable 4*/
Error_id=1004;
if &variable4 not in ('Hotel','Office') then output;
run;
%mend data_validation;
%data_validation(table_name,variable1,variable4)
You could do something like this and pass in the error checks as macros for each table that you want:
%macro err_1001(var);
Error_id=1001;
if missing(&var.) then output;
%mend;
%macro err_1002(var);
Error_id=1002;
if &var. not in ('Business','Employee') then output;
%mend;
%macro data_validation;
data want;
length Error_id 8 Error_record $200;
keep Error_id Error_record ;
set have ;
Error_record=cats(FLT_LAYR_NM,'|',FYP_NM,'|',_TEAM_NM);
/*Error_id=1001;
if missing(FYP_NM) then output;*/
%err_1001(FYP_NM)
/*Error_id=1002;
if _TEAM_NM not in ('Business','Employee') then output;*/
%err_1002(_TEAM_NM);
run;
%mend;
%data_validation;
You could place all your individual error checks in separate sas files, a control file that's read in each time, or just one large sas file. Of course you can automate passing in your tables and variables as desired.
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
I tried your code as follows. But in the macro %err_1004
I tried to pass the unknown variable (new_variable) as value and in that case I want the program to skip that macro as the unknown variable (new_variable) is not vailable in source dataset (HAVE).
So objective is in case if the passing variable is not avilable in the source dataset then I don't that macro to execute.
/*source dataset*/
data have;
infile cards dsd dlm='|' truncover;
input FLT_LAYR_NM :$10. FYP_NM :$30. _TEAM_NM :$30. ;
cards;
DIS|Consistencycheck|Business
DID|Rangecheck|Client
DID| |Client
DID|Rangecheck|Employee
;
run;
/*check1*/
%macro err_1001(var);
Error_id=1001;
if missing(&var.) then output;
%mend;
/*check2*/
%macro err_1002(var);
Error_id=1002;
if &var. not in ('Business','Employee') then output;
%mend;
/*check3*/
%macro err_1004(var);
Error_id=1003;
if &var. not in ('Business','Employee') then output;
%mend;
/*dynamic macro program*/
%macro data_validation;
data want;
length Error_id 8 Error_record $200;
keep Error_id Error_record ;
set have ;
Error_record=cats(FLT_LAYR_NM,'|',FYP_NM,'|',_TEAM_NM);
%err_1001(FYP_NM)
%err_1002(_TEAM_NM);
/*test macro*/
%err_1004(new_variable); /*new_variable is not available in the source dataset*/
run;
%mend;
%data_validation;
Output which I got is,
Error_id | Error_record |
1003 | DIS|Consistencycheck|Business |
1002 | DID|Rangecheck|Client |
1003 | DID|Rangecheck|Client |
1001 | DID||Client |
1002 | DID||Client |
1003 | DID||Client |
1003 | DID|Rangecheck|Employee |
Desired Output,
Error_id | Error_record |
1002 | DID|Rangecheck|Client |
1001 | DID||Client |
1002 | DID||Client |
I see. How many tables are you trying to apply this logic to? Are you trying to apply every %error_xxx check to every dataset regardless of the variables in each dataset? Here's a few options:
1. Manually control which %error_xxx checks are executed for each table (i.e. just don't include %err_1004 if you know the variable doesn't exist for this table). If you have a small number of tables, I would do this.
2. Add a step before where you dynamically determine if a given table has the variables of interest, then use macros to create the call for each %err_xxx check. If you have a large number of tables consider this.
3. Last, I have code that checks to see if a var is in the current dataset during datastep processing using, but it is not efficient and I would rather you go with one of the first two options.
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
This is pretty simple and I think will work:
1. Get a list of all the columns in your current table and save them into a macro:
proc sql noprint;
select upcase(name) into :fields separated by ","
from dictionary.columns
where libname = "SASHELP" and memname = "CARS";
run;
*test;
%put fields= &fields;
>>fields= MAKE,MODEL,TYPE,ORIGIN,DRIVETRAIN,MSRP,INVOICE,ENGINESIZE,CYLINDERS,HORSEPOWER,MPG_CITY,MPG_HIGHWAY,WEIGHT,WHEELBASE,LENGTH
Pass in your libname and memname values as macros.
2. Add this IF check to each of your individual %err_xxx checks:
%macro err_1004(var);
if upcase(vname(var)) in ("&fields") then do;
Error_id=1003;
if &var. not in ('Business','Employee') then output;
end;
%mend;
This way, if your current var isn't in the table (i.e. the list of values in &fields), the check won't run.
Does that make sense?
%macro err_1004(var);
if upcase(vname(var)) in ("&fields") then do;
Error_id=1003;
if &var. not in ('Business','Employee') then output;
end;
%mend;
%macro data_validation;
proc sql noprint;
select upcase(name) into :fields separated by ","
from dictionary.columns
where libname = "SASHELP" and memname = "CARS";
run;
data want;
length Error_id 8 Error_record $200;
keep Error_id Error_record ;
set have ;
Error_record=cats(FLT_LAYR_NM,'|',FYP_NM,'|',_TEAM_NM);
%err_1004(new_field)
run;
%mend;
%data_validation;
Note, this is very similar to @mkeintz's solution. mkeintz's may be more efficient if your data is large since you won't check the list of fields for each datastep pass.
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
If you put safety checks in place, which we have, then you can call all the %err_xxx macros for every table even if the table does not have the specified variables, which I thought is what you wanted. The safety checks won't execute the error check if the field isn't in the table (the macros will still execute, but the IF logic will prevent the error logic from running). This means you can call all the checks for every table.
Does this make sense?
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
So you want to test whether variable1, (or 2 or 3 or 4) exists in table_name, and run the value check only if it in the data set - is that right?
What you can do in put a list of all the TABLE_NAME variables in a macrovar (CSV_VARNAMES), and then do a check on whether VARIABLE1 is found in that list. If it's not then reset it to blank. Later on you can test only if it's not blank.
So prior to your DATA WANT step you can:
proc sql noprint;
select distinct upcase(name) into :csv_varnames separated by ' '
from dictionary.columns where libname="&dslib" and memname="&dsname";
quit;
%do i=1 %to 4;
%let variable&i=%upcase(&&variable&i);
%if %sysfunc(findw(&csv_varnames,&&variable&i))=0 %then %let variable&i=%str();
%end;
Then inside the DATA WANT step you can have:
%if &variable1 ^= %str() %then %do;
/*data validation rule for variable 1*/
Error_id=1001;
if missing(&variable1) then output;
%end;
etc, for varibles2, 3, and 4
To answer your question,
So you want to test whether variable1, (or 2 or 3 or 4) exists in table_name, and run the value check only if it in the data set - is that right? - yes
Could you please help me understand how to pass the Input to the below macro variables? Also I would like to why you've creating macro variable 'i' only till 4. Instead of hard coding can we find the dynmaic way to handle this?
%do i=1 %to 4;
%let variable&i=%upcase(&&variable&i);
%if %sysfunc(findw(&csv_varnames,&&variable&i))=0 %then %let variable&i=%str();
%end;
What I tried is,
%macro validation(variable1,variable2,variable3);
data _null_;
%do i=1 %to 4;
%let variable&i=%upcase(&&variable&i);
%if %sysfunc(findw(&csv_varnames,&&variable&i))=0 %then %let variable&i=%str();
%end;
run;
%mend;
%validation;
Error message is,
SYMBOLGEN: Macro variable I resolves to 4
MLOGIC(VALIDATION): %LET (variable name is VARIABLE4)
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 4
WARNING: Apparent symbolic reference VARIABLE4 not resolved.
WARNING: Apparent symbolic reference VARIABLE4 not resolved.
ERROR: The text expression &VARIABLE4 contains a recursive reference to the macro variable VARIABLE4. The macro variable will be
assigned the null value.
Presumably you want to call the validation macro with arguments, much as in your original post. Something link
%data_validation(sashelp.class,age,height,xx,yy);
Invoking your original macro with the arguments above, and modified as per my suggestion, should discover variables age and height in dataset sashelp.class - and process those variables per your code. It won't discover xx or yy, so no code for variable3 or variable4 will be executed.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.