SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Check if Multiple Variables exist in a SAS Dataset

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 14
Accepted Solution

Check if Multiple Variables exist in a SAS Dataset

I have a dataset that I want to perform checks on:

1st Check: determine if all wanted variables exist in the dataset

2nd Check: determine if there are any extra/unwanted variables in the dataset other than the wanted variables

Let's say I am using this data below:

Data test1;

input id score1 score2 score3 score4 score5 score6;

cards;

24 100 97 99 100 85 85

28 98 87 98 100 44 90

run;

I want to check if id, score1 score2 score3 and score4 exist in the dataset (let's assume I am pulling this dataset from a database and checking to see if someone created the correct dataset with those wanted variables only).

My desire is to perform those two checks above and either output a dataset called check_var_problem1, if not all the wanted variables exist, and check_var_problem2, if there exists more than the wanted variables in the dataset. If a dataset cannot be created with the problems, then a note would be acceptable in the LOG that says something like (Check1): "score2 is not in this dataset" or (Check2): "This dataset contains score5, score6 in the dataset".

I tried using a loop to check for the missing variables, as well as creating macro variables.  See my code below:

%let var1 = id;

%let var2 = score1;

%let var3 = score2;

%let var4 = score3;

%let var5 = score4;

data check_var_problem1;

dsid = open('test1');

do i = 1 to 5;

check=varnum(dsid, 'var&i');

if check = 0 then put 'var&1 does not exist';

end;

run;

Any help is appreciated.

Thanks,


Accepted Solutions
Solution
‎11-28-2014 11:31 AM
Super User
Posts: 9,687

Re: Check if Multiple Variables exist in a SAS Dataset

Data test1;
input id score1 score2 score3 score4 score5 score6;
cards;
24 100 97 99 100 85 85
28 98 87 98 100 44 90
run;

%macro check(libname= ,tname= ,vars= );
data have;
 list="&vars";
 do i=1 to countw(list);
  var=scan(list,i);output;
 end;
 keep var;
run;

proc sql noprint;
create table a as
 select upcase(var) as var from have
 except
 select upcase(name) from dictionary.columns where libname="%upcase(&libname)" and memname="%upcase(&tname)";

create table b as
 select upcase(name) as var from dictionary.columns where libname="%upcase(&libname)" and memname="%upcase(&tname)"
 except
 select upcase(var) from have ;

select count(*) into : a from a;
select count(*) into : b from b;

select var into : alist separated by ' ' from a;
select var into : blist separated by ' ' from b;

quit;

%if &a eq 0 %then %put WARNING: All the variable is included. ;
 %else      %put WARNING: &alist variables are not included. ;

%if &b eq 0 %then %put WARNING: there are not extra variables. ;
 %else      %put WARNING: &blist variables are extra variables. ;
%mend check;

%check(libname=work ,tname=test1 ,vars= id score1 score2 score3 score4)



Xia Keshan

View solution in original post


All Replies
Super User
Posts: 17,868

Re: Check if Multiple Variables exist in a SAS Dataset

Perhaps consider checking the metadata on the dataset instead.

Look into SASHELP.VCOLUMN or dictionary.column table that SAS has.

Then if you have your data structure defined in a dataset you can do a SQL join or Datastep Merge and easily identify which variables are present/absent.

Occasional Contributor
Posts: 14

Re: Check if Multiple Variables exist in a SAS Dataset

I need something to be displayed in the log for a quick verification. Thanks, but I don't think that is what I am looking for as an answer.

Super User
Posts: 5,085

Re: Check if Multiple Variables exist in a SAS Dataset

A few items to note ...

First, single quotes suppresses all macro-related activity.  All single quotes must become double quotes.

Second, var&i is looking for a macro variable named &i.  You don't have one, nor do you need one.  Instead of %LET statements, you could include these statements in a DATA step:

var1 = 'id';

var2 = 'score1';

...

var5 = 'score4';

Then use DATA step processing:

array vars {5} var1-var5;

Inside the loop:

check = varnum(dsid, vars{i});

I'd suggest you get the code working 100% for a single data set without any macro language involved.  Then consider how to use macro language in the solution.

Super User
Posts: 17,868

Re: Check if Multiple Variables exist in a SAS Dataset

This is a long way.

I'm sure you could do it simpler with some call vnext functions and such in a single data step but I don't know how that would work.

data class;

set sashelp.class;

drop sex age;

weight2=weight*2;

run;

%macro compare_data(base=, compare=);

proc sql noprint;

  create table base_in as

  select name

  from sashelp.vcolumn

  where libname=upper(scan("&base", 1, "."))

  and memname=upper(scan("&base", 2, "."))

  order by varnum;

quit;

proc sql noprint;

  create table compare_in as

  select name

  from sashelp.vcolumn

  where libname=upper(scan("&compare", 1, "."))

  and memname=upper(scan("&compare", 2, "."))

  order by varnum;

quit;

proc sql;

  create table comparison as

  select a.name as base_var, b.name as compare_var,

  case when missing(a.name) then catx("-", "Comparison dataset has extra variable", b.name)

  when missing(b.name) then catx("-", "Comparison dataset is missing variable", a.name)

  when a.name=b.name then "Variable in both datasets"

  else "CHECKME"

  end as comparison

  from base_in as a

  full join compare_in as b

  on a.name=b.name;

quit;

data _null_;

set comparison;

if comparison not eq "Variable in both datasets"

  then put "ERROR:" comparison;

run;

%mend;

%compare_data(base=sashelp.class, compare=work.class);

Occasional Contributor
Posts: 8

Re: Check if Multiple Variables exist in a SAS Dataset

The method I always use is based on proc contents. It's almost similar to the method Reeza is describing, but less coding.

You run a proc contents on the test1 data-set. The keep variables you can use not only to check the varname but also if type, length or order matches.

proc contents data=test1 out=meta_test1 (keep=name type length varnum);

run;

Then you create a template file (or use the one that's registered in the metadata);

data meta_temp;

     attrib      Id     length=8      format=8.

                    score1 length=8 format=8.

                    etc...

     stop;

run;

Then the rest is the same as the proc sql; create table comparison from Reeza's code.

If you want to check of order, type, length, just add them to the join.

Super Contributor
Posts: 305

Re: Check if Multiple Variables exist in a SAS Dataset

Hello,

%let var1 = Age;
%let var2 = Sex;
%let var3 = blabla;

/*determines variable of database*/
proc sql;
create table varb as select name from sashelp.vcolumn
where memname='CLASS';
quit;

/*write the variables to search for in a database*/
%macro a;

%let k=1;

data tosearch;
length vartosearch $ 20;
%do %while (%symexist(var&k));
vartosearch="&&var&k";
output;
%let k=%eval(&k+1);
%end;
run;

%mend a;

%a

/*determines the 2 wanted databases*/
proc sql;
create table check_var_problem2 as
select name from varb
except
select vartosearch from tosearch;
create table check_var_problem1 as
select vartosearch from tosearch
except
select name from varb;
quit;

PROC Star
Posts: 7,363

Re: Check if Multiple Variables exist in a SAS Dataset

I think that something like the following would accomplish what you want:

data test1;

  input id score1 x score3 y score5 score6;

  cards;

24 100 97 99 100 85 85

28 98 87 98 100 44 90

run;

%macro check(filenm,vars);

  data check (drop=&vars.);

    set &filenm. (obs=1);

    _error_=0;

  run;

  proc sql noprint;

    select name

      into :vnames separated by " "

         from dictionary.columns

           where libname="WORK" and

                 memname="CHECK"

     ;

  quit;

  %let nvar=&sqlobs.;

  %if nvar=0 %then %put "There were no extra variables";

  %else %then %put "Unexpected vars in file: &vnames.";

%mend check;

%check(test1,id score1 score2 score3 score4)

Occasional Contributor
Posts: 14

Re: Check if Multiple Variables exist in a SAS Dataset

Thanks all for your help. Xia Keshan's code was truly helpful.  I will put this as Answered.

Happy Holidays!

SAS Employee
Posts: 23

Re: Check if Multiple Variables exist in a SAS Dataset

Why not use Proc Compare?

proc compare data=Dataset1 compare=Dataset2 nosummary listvar;

run;

Solution
‎11-28-2014 11:31 AM
Super User
Posts: 9,687

Re: Check if Multiple Variables exist in a SAS Dataset

Data test1;
input id score1 score2 score3 score4 score5 score6;
cards;
24 100 97 99 100 85 85
28 98 87 98 100 44 90
run;

%macro check(libname= ,tname= ,vars= );
data have;
 list="&vars";
 do i=1 to countw(list);
  var=scan(list,i);output;
 end;
 keep var;
run;

proc sql noprint;
create table a as
 select upcase(var) as var from have
 except
 select upcase(name) from dictionary.columns where libname="%upcase(&libname)" and memname="%upcase(&tname)";

create table b as
 select upcase(name) as var from dictionary.columns where libname="%upcase(&libname)" and memname="%upcase(&tname)"
 except
 select upcase(var) from have ;

select count(*) into : a from a;
select count(*) into : b from b;

select var into : alist separated by ' ' from a;
select var into : blist separated by ' ' from b;

quit;

%if &a eq 0 %then %put WARNING: All the variable is included. ;
 %else      %put WARNING: &alist variables are not included. ;

%if &b eq 0 %then %put WARNING: there are not extra variables. ;
 %else      %put WARNING: &blist variables are extra variables. ;
%mend check;

%check(libname=work ,tname=test1 ,vars= id score1 score2 score3 score4)



Xia Keshan

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 4718 views
  • 8 likes
  • 8 in conversation