BookmarkSubscribeRSS Feed
David_Billa
Rhodochrosite | Level 12

I've written the code for data validation for one dataset. I would like to develop further for multiple datasets using macro. Now the problem is that the rules which I want to write is not applicable for all the datasets. Because the variables for each dataset is not same. 

 

Now I want to develop the below code to handle all the rules for multiple datasets. I want to feed the dataset name and variable names as keyword parameter macro.

 

data have;
  infile cards dsd dlm='|' truncover;
  input FLT_LAYR_NM :$10. FYP_NM :$30. _TEAM_NM :$30. ;
cards;
DIS|Consistencycheck|Business
DID|Rangecheck|Client
DID| |Client
DID|Rangecheck|Employee
;

data want;
  length Error_id 8	Error_record $200;
  keep Error_id	Error_record ;
  set have ;
  Error_record=cats(FLT_LAYR_NM,'|',FYP_NM,'|',_TEAM_NM);

  Error_id=1001;
  if missing(FYP_NM) then output;

  Error_id=1002;
  if _TEAM_NM not in ('Business','Employee') then output;
run;

SAS code or something similar which I except:

 

But the challenge here is I want to skip the condition (or data validation) in case if the variable not exists in the calling macro.

 

%macro data_validation(table_name,variable1,variable2,variable3,variable4)
data have;
  infile cards dsd dlm='|' truncover;
  input FLT_LAYR_NM :$10. FYP_NM :$30. _TEAM_NM :$30. ;
cards;
DIS|Consistencycheck|Business
DID|Rangecheck|Client
DID| |Client
DID|Rangecheck|Employee
;

data want;
  length Error_id 8	Error_record $200;
  keep Error_id	Error_record ;
  set &table_name;
  Error_record=cats(FLT_LAYR_NM,'|',FYP_NM,'|',_TEAM_NM);

/*data validation rule for variable 1*/
  Error_id=1001;
  if missing(&variable1) then output;

/*data validation rule for variable 2*/
  Error_id=1002;
  if &variable2 not in ('Business','Employee') then output;
run;

/*data validation rule for variable 3*/

Error_id=1003;
  if &variable3  not in ('Bus','Train') then output;
run;

/*data validation rule for variable 4*/

Error_id=1004;
  if &variable4  not in ('Hotel','Office') then output;
run;

%mend data_validation;

%data_validation(table_name,variable1,variable4)
11 REPLIES 11
noling
SAS Employee

You could do something like this and pass in the error checks as macros for each table that you want:

 

%macro err_1001(var);
	Error_id=1001;
	if missing(&var.) then output;
%mend;


%macro err_1002(var);
	Error_id=1002;
    if &var. not in ('Business','Employee') then output;
%mend;

%macro data_validation;
	data want;
	  length Error_id 8	Error_record $200;
	  keep Error_id	Error_record ;
	  set have ;
	  Error_record=cats(FLT_LAYR_NM,'|',FYP_NM,'|',_TEAM_NM);

	  /*Error_id=1001;
	  if missing(FYP_NM) then output;*/
	  %err_1001(FYP_NM)

	  /*Error_id=1002;
	  if _TEAM_NM not in ('Business','Employee') then output;*/
	  %err_1002(_TEAM_NM);

	run;
%mend;
%data_validation;

You could place all your individual error checks in separate sas files, a control file that's read in each time, or just one large sas file. Of course you can automate passing in your tables and variables as desired. 


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

David_Billa
Rhodochrosite | Level 12

I tried your code as follows. But in the macro %err_1004 I tried to pass the unknown variable (new_variable) as value and in that case I want the program to skip that macro as the unknown variable (new_variable) is not vailable in source dataset (HAVE).

 

So objective is in case if the passing variable is not avilable in the source dataset then I don't that macro to execute.

 

/*source dataset*/
data have;
  infile cards dsd dlm='|' truncover;
  input FLT_LAYR_NM :$10. FYP_NM :$30. _TEAM_NM :$30. ;
cards;
DIS|Consistencycheck|Business
DID|Rangecheck|Client
DID| |Client
DID|Rangecheck|Employee
;
run;

/*check1*/
%macro err_1001(var);
	Error_id=1001;
	if missing(&var.) then output;
%mend;

/*check2*/
%macro err_1002(var);
	Error_id=1002;
    if &var. not in ('Business','Employee') then output;
%mend;

/*check3*/
%macro err_1004(var);
	Error_id=1003;
    if &var. not in ('Business','Employee') then output;
%mend;

/*dynamic macro program*/
%macro data_validation;
	data want;
	  length Error_id 8	Error_record $200;
	  keep Error_id	Error_record ;
	  set have ;
	  Error_record=cats(FLT_LAYR_NM,'|',FYP_NM,'|',_TEAM_NM);

	  %err_1001(FYP_NM)

	  %err_1002(_TEAM_NM); 
	

      /*test macro*/
	  %err_1004(new_variable); /*new_variable is not available in the source dataset*/

	run;
%mend;

%data_validation;

Output which I got is,

 

Error_id Error_record
1003 DIS|Consistencycheck|Business
1002 DID|Rangecheck|Client
1003 DID|Rangecheck|Client
1001 DID||Client
1002 DID||Client
1003 DID||Client
1003 DID|Rangecheck|Employee

 

Desired Output,

 

Error_id Error_record
1002 DID|Rangecheck|Client
1001 DID||Client
1002 DID||Client
noling
SAS Employee

I see. How many tables are you trying to apply this logic to? Are you trying to apply every %error_xxx check to every dataset regardless of the variables in each dataset? Here's a few options:

 

1. Manually control which %error_xxx checks are executed for each table (i.e. just don't include %err_1004 if you know the variable doesn't exist for this table). If you have a small number of tables, I would do this.

2. Add a step before where you dynamically determine if a given table has the variables of interest, then use macros to create the call for each %err_xxx check. If you have a large number of tables consider this.

3. Last, I have code that checks to see if a var is in the current dataset during datastep processing using, but it is not efficient and I would rather you go with one of the first two options. 


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

David_Billa
Rhodochrosite | Level 12
I want to do the validation checks for 10 datasets and I have 70 validation
checks all together to perform on those tables.

Yes, I tried to execute %error_xxxx irrespective of the table.

Could you please help me how to control %error_xxxxx based on the source
dataset?
noling
SAS Employee

This is pretty simple and I think will work:

 

1. Get a list of all the columns in your current table and save them into a macro:

 

 

proc sql noprint;
	select upcase(name) into :fields separated by ","
	from dictionary.columns
	where libname = "SASHELP" and memname = "CARS";
run;

*test;
%put fields= &fields;
>>fields= MAKE,MODEL,TYPE,ORIGIN,DRIVETRAIN,MSRP,INVOICE,ENGINESIZE,CYLINDERS,HORSEPOWER,MPG_CITY,MPG_HIGHWAY,WEIGHT,WHEELBASE,LENGTH

Pass in your libname and memname values as macros.

 

2. Add this IF check to each of your individual %err_xxx checks:

 

%macro err_1004(var);
	if upcase(vname(var)) in ("&fields") then do;
	    Error_id=1003;
	    if &var. not in ('Business','Employee') then output;
	end;
%mend;

This way, if your current var isn't in the table (i.e. the list of values in &fields), the check won't run.

 

Does that make sense?

%macro err_1004(var);
	if upcase(vname(var)) in ("&fields") then do;
		Error_id=1003;
	    if &var. not in ('Business','Employee') then output;
	end;
%mend;

%macro data_validation;
	proc sql noprint;
		select upcase(name) into :fields separated by ","
		from dictionary.columns
		where libname = "SASHELP" and memname = "CARS";
	run;

	data want;
	  length Error_id 8	Error_record $200;
	  keep Error_id	Error_record ;
	  set have ;
	  Error_record=cats(FLT_LAYR_NM,'|',FYP_NM,'|',_TEAM_NM);

	  %err_1004(new_field)

	run;
%mend;
%data_validation;

 

Note, this is very similar to @mkeintz's solution. mkeintz's may be more efficient if your data is large since you won't check the list of fields for each datastep pass.


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

David_Billa
Rhodochrosite | Level 12
Thanks.I understand that I have to create macros for each validation check.
My question is how will you know which macro (validation check macros which
we create prior) to call in %data_validation macro? Because if we call all
the data validation macros then we end up with error as all the variables
will not be available in the source dataset.

I would like to call the validation check macro (not all) which is valid
for the dataset which I'm going to put as source.

noling
SAS Employee

If you put safety checks in place, which we have, then you can call all the %err_xxx macros for every table even if the table does not have the specified variables, which I thought is what you wanted. The safety checks won't execute the error check if the field isn't in the table (the macros will still execute, but the IF logic will prevent the error logic from running). This means you can call all the checks for every table.

 

Does this make sense?

 

 


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

David_Billa
Rhodochrosite | Level 12
Thanks, it does makes sense to me. Yes, you understood correctly on the
requirement.

Let me execute your code which has 'safety checks' which you posted in your
previous post and come back to you.

If you find other way to achieve this task then let me know. Because I'm
worried about multiple IF clause being executed as I have 70+ validations
in real life.
mkeintz
PROC Star

So you want to test whether variable1, (or 2 or 3 or 4) exists in table_name, and run the value check only if it in the data set - is that right?

 

What you can do in put a list of all the TABLE_NAME variables in a macrovar (CSV_VARNAMES), and then do a check on whether VARIABLE1 is found in that list.  If it's not then reset it to blank.  Later on you can test only if it's not blank.

 

So prior to your DATA WANT step you can:

 

  proc sql noprint;
    select distinct upcase(name) into :csv_varnames separated by ' '
    from dictionary.columns where libname="&dslib" and memname="&dsname";
  quit;

  %do i=1 %to 4;
    %let variable&i=%upcase(&&variable&i);
    %if %sysfunc(findw(&csv_varnames,&&variable&i))=0 %then %let variable&i=%str();
  %end;

Then inside the DATA WANT step you can have:

%if &variable1 ^= %str() %then %do;
    /*data validation rule for variable 1*/
    Error_id=1001;
    if missing(&variable1) then output;
%end;

etc, for varibles2, 3, and 4

 

 

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
David_Billa
Rhodochrosite | Level 12

To answer your question,

 

So you want to test whether variable1, (or 2 or 3 or 4) exists in table_name, and run the value check only if it in the data set - is that right? - yes

 

Could you please help me understand how to pass the Input to the below macro variables? Also I would like to why you've creating macro variable 'i' only till 4. Instead of hard coding can we find the dynmaic way to handle this?

 

 %do i=1 %to 4;
    %let variable&i=%upcase(&&variable&i);
    %if %sysfunc(findw(&csv_varnames,&&variable&i))=0 %then %let variable&i=%str();
  %end;

What I tried is,

 

%macro validation(variable1,variable2,variable3);
  data _null_;
  %do i=1 %to 4;
    %let variable&i=%upcase(&&variable&i);
    %if %sysfunc(findw(&csv_varnames,&&variable&i))=0 %then %let variable&i=%str();
  %end;
  run;
  %mend;
%validation;

Error message is,

 

SYMBOLGEN:  Macro variable I resolves to 4
MLOGIC(VALIDATION):  %LET (variable name is VARIABLE4)
SYMBOLGEN:  && resolves to &.
SYMBOLGEN:  Macro variable I resolves to 4
WARNING: Apparent symbolic reference VARIABLE4 not resolved.
WARNING: Apparent symbolic reference VARIABLE4 not resolved.
ERROR: The text expression &VARIABLE4 contains a recursive reference to the macro variable VARIABLE4.  The macro variable will be 
       assigned the null value.

 

 

mkeintz
PROC Star

Presumably you want to call the validation macro with arguments, much as in your original post.  Something link

 

%data_validation(sashelp.class,age,height,xx,yy);

Invoking your original macro with the arguments above, and  modified as per my suggestion, should discover variables age and height in dataset sashelp.class - and process those variables per your code.   It won't discover xx or yy, so no code for variable3 or variable4 will be executed.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1226 views
  • 2 likes
  • 3 in conversation