BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

I have a dataset that I want to perform checks on:

1st Check: determine if all wanted variables exist in the dataset

2nd Check: determine if there are any extra/unwanted variables in the dataset other than the wanted variables

Let's say I am using this data below:

Data test1;

input id score1 score2 score3 score4 score5 score6;

cards;

24 100 97 99 100 85 85

28 98 87 98 100 44 90

run;

I want to check if id, score1 score2 score3 and score4 exist in the dataset (let's assume I am pulling this dataset from a database and checking to see if someone created the correct dataset with those wanted variables only).

My desire is to perform those two checks above and either output a dataset called check_var_problem1, if not all the wanted variables exist, and check_var_problem2, if there exists more than the wanted variables in the dataset. If a dataset cannot be created with the problems, then a note would be acceptable in the LOG that says something like (Check1): "score2 is not in this dataset" or (Check2): "This dataset contains score5, score6 in the dataset".

I tried using a loop to check for the missing variables, as well as creating macro variables.  See my code below:

%let var1 = id;

%let var2 = score1;

%let var3 = score2;

%let var4 = score3;

%let var5 = score4;

data check_var_problem1;

dsid = open('test1');

do i = 1 to 5;

check=varnum(dsid, 'var&i');

if check = 0 then put 'var&1 does not exist';

end;

run;

Any help is appreciated.

Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
Data test1;
input id score1 score2 score3 score4 score5 score6;
cards;
24 100 97 99 100 85 85
28 98 87 98 100 44 90
run;

%macro check(libname= ,tname= ,vars= );
data have;
 list="&vars";
 do i=1 to countw(list);
  var=scan(list,i);output;
 end;
 keep var;
run;

proc sql noprint;
create table a as
 select upcase(var) as var from have
 except
 select upcase(name) from dictionary.columns where libname="%upcase(&libname)" and memname="%upcase(&tname)";

create table b as
 select upcase(name) as var from dictionary.columns where libname="%upcase(&libname)" and memname="%upcase(&tname)"
 except
 select upcase(var) from have ;

select count(*) into : a from a;
select count(*) into : b from b;

select var into : alist separated by ' ' from a;
select var into : blist separated by ' ' from b;

quit;

%if &a eq 0 %then %put WARNING: All the variable is included. ;
 %else      %put WARNING: &alist variables are not included. ;

%if &b eq 0 %then %put WARNING: there are not extra variables. ;
 %else      %put WARNING: &blist variables are extra variables. ;
%mend check;

%check(libname=work ,tname=test1 ,vars= id score1 score2 score3 score4)



Xia Keshan

View solution in original post

10 REPLIES 10
Reeza
Super User

Perhaps consider checking the metadata on the dataset instead.

Look into SASHELP.VCOLUMN or dictionary.column table that SAS has.

Then if you have your data structure defined in a dataset you can do a SQL join or Datastep Merge and easily identify which variables are present/absent.

jprosenbaum8908
Calcite | Level 5

I need something to be displayed in the log for a quick verification. Thanks, but I don't think that is what I am looking for as an answer.

Astounding
PROC Star

A few items to note ...

First, single quotes suppresses all macro-related activity.  All single quotes must become double quotes.

Second, var&i is looking for a macro variable named &i.  You don't have one, nor do you need one.  Instead of %LET statements, you could include these statements in a DATA step:

var1 = 'id';

var2 = 'score1';

...

var5 = 'score4';

Then use DATA step processing:

array vars {5} var1-var5;

Inside the loop:

check = varnum(dsid, vars{i});

I'd suggest you get the code working 100% for a single data set without any macro language involved.  Then consider how to use macro language in the solution.

Reeza
Super User

This is a long way.

I'm sure you could do it simpler with some call vnext functions and such in a single data step but I don't know how that would work.

data class;

set sashelp.class;

drop sex age;

weight2=weight*2;

run;

%macro compare_data(base=, compare=);

proc sql noprint;

  create table base_in as

  select name

  from sashelp.vcolumn

  where libname=upper(scan("&base", 1, "."))

  and memname=upper(scan("&base", 2, "."))

  order by varnum;

quit;

proc sql noprint;

  create table compare_in as

  select name

  from sashelp.vcolumn

  where libname=upper(scan("&compare", 1, "."))

  and memname=upper(scan("&compare", 2, "."))

  order by varnum;

quit;

proc sql;

  create table comparison as

  select a.name as base_var, b.name as compare_var,

  case when missing(a.name) then catx("-", "Comparison dataset has extra variable", b.name)

  when missing(b.name) then catx("-", "Comparison dataset is missing variable", a.name)

  when a.name=b.name then "Variable in both datasets"

  else "CHECKME"

  end as comparison

  from base_in as a

  full join compare_in as b

  on a.name=b.name;

quit;

data _null_;

set comparison;

if comparison not eq "Variable in both datasets"

  then put "ERROR:" comparison;

run;

%mend;

%compare_data(base=sashelp.class, compare=work.class);

sdoorneveld
Fluorite | Level 6

The method I always use is based on proc contents. It's almost similar to the method Reeza is describing, but less coding.

You run a proc contents on the test1 data-set. The keep variables you can use not only to check the varname but also if type, length or order matches.

proc contents data=test1 out=meta_test1 (keep=name type length varnum);

run;

Then you create a template file (or use the one that's registered in the metadata);

data meta_temp;

     attrib      Id     length=8      format=8.

                    score1 length=8 format=8.

                    etc...

     stop;

run;

Then the rest is the same as the proc sql; create table comparison from Reeza's code.

If you want to check of order, type, length, just add them to the join.

Loko
Barite | Level 11

Hello,

%let var1 = Age;
%let var2 = Sex;
%let var3 = blabla;

/*determines variable of database*/
proc sql;
create table varb as select name from sashelp.vcolumn
where memname='CLASS';
quit;

/*write the variables to search for in a database*/
%macro a;

%let k=1;

data tosearch;
length vartosearch $ 20;
%do %while (%symexist(var&k));
vartosearch="&&var&k";
output;
%let k=%eval(&k+1);
%end;
run;

%mend a;

%a

/*determines the 2 wanted databases*/
proc sql;
create table check_var_problem2 as
select name from varb
except
select vartosearch from tosearch;
create table check_var_problem1 as
select vartosearch from tosearch
except
select name from varb;
quit;

art297
Opal | Level 21

I think that something like the following would accomplish what you want:

data test1;

  input id score1 x score3 y score5 score6;

  cards;

24 100 97 99 100 85 85

28 98 87 98 100 44 90

run;

%macro check(filenm,vars);

  data check (drop=&vars.);

    set &filenm. (obs=1);

    _error_=0;

  run;

  proc sql noprint;

    select name

      into :vnames separated by " "

         from dictionary.columns

           where libname="WORK" and

                 memname="CHECK"

     ;

  quit;

  %let nvar=&sqlobs.;

  %if nvar=0 %then %put "There were no extra variables";

  %else %then %put "Unexpected vars in file: &vnames.";

%mend check;

%check(test1,id score1 score2 score3 score4)

jprosenbaum8908
Calcite | Level 5

Thanks all for your help. Xia Keshan's code was truly helpful.  I will put this as Answered.

Happy Holidays!

Fraktalnisse
SAS Employee

Why not use Proc Compare?

proc compare data=Dataset1 compare=Dataset2 nosummary listvar;

run;

Ksharp
Super User
Data test1;
input id score1 score2 score3 score4 score5 score6;
cards;
24 100 97 99 100 85 85
28 98 87 98 100 44 90
run;

%macro check(libname= ,tname= ,vars= );
data have;
 list="&vars";
 do i=1 to countw(list);
  var=scan(list,i);output;
 end;
 keep var;
run;

proc sql noprint;
create table a as
 select upcase(var) as var from have
 except
 select upcase(name) from dictionary.columns where libname="%upcase(&libname)" and memname="%upcase(&tname)";

create table b as
 select upcase(name) as var from dictionary.columns where libname="%upcase(&libname)" and memname="%upcase(&tname)"
 except
 select upcase(var) from have ;

select count(*) into : a from a;
select count(*) into : b from b;

select var into : alist separated by ' ' from a;
select var into : blist separated by ' ' from b;

quit;

%if &a eq 0 %then %put WARNING: All the variable is included. ;
 %else      %put WARNING: &alist variables are not included. ;

%if &b eq 0 %then %put WARNING: there are not extra variables. ;
 %else      %put WARNING: &blist variables are extra variables. ;
%mend check;

%check(libname=work ,tname=test1 ,vars= id score1 score2 score3 score4)



Xia Keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 10746 views
  • 8 likes
  • 8 in conversation