I have a dataset that I want to perform checks on:
1st Check: determine if all wanted variables exist in the dataset
2nd Check: determine if there are any extra/unwanted variables in the dataset other than the wanted variables
Let's say I am using this data below:
Data test1;
input id score1 score2 score3 score4 score5 score6;
cards;
24 100 97 99 100 85 85
28 98 87 98 100 44 90
run;
I want to check if id, score1 score2 score3 and score4 exist in the dataset (let's assume I am pulling this dataset from a database and checking to see if someone created the correct dataset with those wanted variables only).
My desire is to perform those two checks above and either output a dataset called check_var_problem1, if not all the wanted variables exist, and check_var_problem2, if there exists more than the wanted variables in the dataset. If a dataset cannot be created with the problems, then a note would be acceptable in the LOG that says something like (Check1): "score2 is not in this dataset" or (Check2): "This dataset contains score5, score6 in the dataset".
I tried using a loop to check for the missing variables, as well as creating macro variables. See my code below:
%let var1 = id;
%let var2 = score1;
%let var3 = score2;
%let var4 = score3;
%let var5 = score4;
data check_var_problem1;
dsid = open('test1');
do i = 1 to 5;
check=varnum(dsid, 'var&i');
if check = 0 then put 'var&1 does not exist';
end;
run;
Any help is appreciated.
Thanks,
Data test1; input id score1 score2 score3 score4 score5 score6; cards; 24 100 97 99 100 85 85 28 98 87 98 100 44 90 run; %macro check(libname= ,tname= ,vars= ); data have; list="&vars"; do i=1 to countw(list); var=scan(list,i);output; end; keep var; run; proc sql noprint; create table a as select upcase(var) as var from have except select upcase(name) from dictionary.columns where libname="%upcase(&libname)" and memname="%upcase(&tname)"; create table b as select upcase(name) as var from dictionary.columns where libname="%upcase(&libname)" and memname="%upcase(&tname)" except select upcase(var) from have ; select count(*) into : a from a; select count(*) into : b from b; select var into : alist separated by ' ' from a; select var into : blist separated by ' ' from b; quit; %if &a eq 0 %then %put WARNING: All the variable is included. ; %else %put WARNING: &alist variables are not included. ; %if &b eq 0 %then %put WARNING: there are not extra variables. ; %else %put WARNING: &blist variables are extra variables. ; %mend check; %check(libname=work ,tname=test1 ,vars= id score1 score2 score3 score4)
Xia Keshan
Perhaps consider checking the metadata on the dataset instead.
Look into SASHELP.VCOLUMN or dictionary.column table that SAS has.
Then if you have your data structure defined in a dataset you can do a SQL join or Datastep Merge and easily identify which variables are present/absent.
I need something to be displayed in the log for a quick verification. Thanks, but I don't think that is what I am looking for as an answer.
A few items to note ...
First, single quotes suppresses all macro-related activity. All single quotes must become double quotes.
Second, var&i is looking for a macro variable named &i. You don't have one, nor do you need one. Instead of %LET statements, you could include these statements in a DATA step:
var1 = 'id';
var2 = 'score1';
...
var5 = 'score4';
Then use DATA step processing:
array vars {5} var1-var5;
Inside the loop:
check = varnum(dsid, vars{i});
I'd suggest you get the code working 100% for a single data set without any macro language involved. Then consider how to use macro language in the solution.
This is a long way.
I'm sure you could do it simpler with some call vnext functions and such in a single data step but I don't know how that would work.
data class;
set sashelp.class;
drop sex age;
weight2=weight*2;
run;
%macro compare_data(base=, compare=);
proc sql noprint;
create table base_in as
select name
from sashelp.vcolumn
where libname=upper(scan("&base", 1, "."))
and memname=upper(scan("&base", 2, "."))
order by varnum;
quit;
proc sql noprint;
create table compare_in as
select name
from sashelp.vcolumn
where libname=upper(scan("&compare", 1, "."))
and memname=upper(scan("&compare", 2, "."))
order by varnum;
quit;
proc sql;
create table comparison as
select a.name as base_var, b.name as compare_var,
case when missing(a.name) then catx("-", "Comparison dataset has extra variable", b.name)
when missing(b.name) then catx("-", "Comparison dataset is missing variable", a.name)
when a.name=b.name then "Variable in both datasets"
else "CHECKME"
end as comparison
from base_in as a
full join compare_in as b
on a.name=b.name;
quit;
data _null_;
set comparison;
if comparison not eq "Variable in both datasets"
then put "ERROR:" comparison;
run;
%mend;
%compare_data(base=sashelp.class, compare=work.class);
The method I always use is based on proc contents. It's almost similar to the method Reeza is describing, but less coding.
You run a proc contents on the test1 data-set. The keep variables you can use not only to check the varname but also if type, length or order matches.
proc contents data=test1 out=meta_test1 (keep=name type length varnum);
run;
Then you create a template file (or use the one that's registered in the metadata);
data meta_temp;
attrib Id length=8 format=8.
score1 length=8 format=8.
etc...
stop;
run;
Then the rest is the same as the proc sql; create table comparison from Reeza's code.
If you want to check of order, type, length, just add them to the join.
Hello,
%let var1 = Age;
%let var2 = Sex;
%let var3 = blabla;
/*determines variable of database*/
proc sql;
create table varb as select name from sashelp.vcolumn
where memname='CLASS';
quit;
/*write the variables to search for in a database*/
%macro a;
%let k=1;
data tosearch;
length vartosearch $ 20;
%do %while (%symexist(var&k));
vartosearch="&&var&k";
output;
%let k=%eval(&k+1);
%end;
run;
%mend a;
%a
/*determines the 2 wanted databases*/
proc sql;
create table check_var_problem2 as
select name from varb
except
select vartosearch from tosearch;
create table check_var_problem1 as
select vartosearch from tosearch
except
select name from varb;
quit;
I think that something like the following would accomplish what you want:
data test1;
input id score1 x score3 y score5 score6;
cards;
24 100 97 99 100 85 85
28 98 87 98 100 44 90
run;
%macro check(filenm,vars);
data check (drop=&vars.);
set &filenm. (obs=1);
_error_=0;
run;
proc sql noprint;
select name
into :vnames separated by " "
from dictionary.columns
where libname="WORK" and
memname="CHECK"
;
quit;
%let nvar=&sqlobs.;
%if nvar=0 %then %put "There were no extra variables";
%else %then %put "Unexpected vars in file: &vnames.";
%mend check;
%check(test1,id score1 score2 score3 score4)
Thanks all for your help. Xia Keshan's code was truly helpful. I will put this as Answered.
Happy Holidays!
Why not use Proc Compare?
proc compare data=Dataset1 compare=Dataset2 nosummary listvar;
run;
Data test1; input id score1 score2 score3 score4 score5 score6; cards; 24 100 97 99 100 85 85 28 98 87 98 100 44 90 run; %macro check(libname= ,tname= ,vars= ); data have; list="&vars"; do i=1 to countw(list); var=scan(list,i);output; end; keep var; run; proc sql noprint; create table a as select upcase(var) as var from have except select upcase(name) from dictionary.columns where libname="%upcase(&libname)" and memname="%upcase(&tname)"; create table b as select upcase(name) as var from dictionary.columns where libname="%upcase(&libname)" and memname="%upcase(&tname)" except select upcase(var) from have ; select count(*) into : a from a; select count(*) into : b from b; select var into : alist separated by ' ' from a; select var into : blist separated by ' ' from b; quit; %if &a eq 0 %then %put WARNING: All the variable is included. ; %else %put WARNING: &alist variables are not included. ; %if &b eq 0 %then %put WARNING: there are not extra variables. ; %else %put WARNING: &blist variables are extra variables. ; %mend check; %check(libname=work ,tname=test1 ,vars= id score1 score2 score3 score4)
Xia Keshan
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.