Hi,
Sorry, the subject isn't very clear. I wasn't sure how to word it. So here is the scenario:
I have a bunch of datasets in two different directories on UNIX, like this:
/user/dir1/: example1.sas7bdat, example2.sas7bdat, example3.sas7bdat
/user/dir2/: example1.sas7bdat, example3.sas7bdat
dir1 datasets has ID column, VarA, VarB, etc.
dir2 datasets has ID column, ColA, ColB, etc.
My objective is to check the values of VarA against the values of ColA for each dataset that matches. So it will check only for example1.sas7bdat and example3.sas7bdat.
Question 1: How do I do that if I have dozens of directories and each directory have dozens of datasets?
Question 2: If the above doesn't work, how do I do it with 2 directories at a time? It is just no efficient to manually type in the datasets because some directories have dozens of datasets.
Thanks!
As suggested by others first make things work for you for a single use case creating some fully functional static code. Only then amend the code and make it dynamic.
In the following some sample code which hopefully will give you some ideas.
1. Create some sample data
/* create sample Master table to compare against */
data work.master;
set sashelp.class;
idVar=_n_;
colA=name;
run;
/* create sample ds for comparison agains master */
options dlcreatedir;
%let other_folder=%sysfunc(pathname(work))\other_folder;
libname sampleDs "&other_folder";
data sampleDs.ds1
sampleDs.ds2(drop=varA)
sampleDs.ds3;
set master(rename=(colA=varA));
output sampleDs.ds1;
output sampleDs.ds2;
if _n_=3 then varA='XXXX';
output sampleDs.ds3;
run;
2. Code for comparison against a single table. Make this code work for you up to the point where you get exactly what you want.
If you can't be sure that all tables will be pre-sorted by id then also already implement here the required logic to avoid any potential issues with not pre-sorted tables.
/* compare ds agains master */
proc compare
base=work.master
comp=sampleDs.ds3
novalues
briefsummary
;
var colA;
with varA;
id idVar;
run;
3. Only once you've got above 100% working start to make the code dynamic. You can see below how much this complicates the code so you don't want to have to debug issues in dynamic code which you can already resolve in the static code version.
/* compare ds agains master */
%macro comp_ds(
masterDS=work.master,
masterCol=colA,
compDir=&other_folder,
compCol=varA,
idVar=id
);
libname compdir "&other_folder" access=readonly;
%local ds_list iter;
%let ds_list=;
%let iter=0;
proc sql noprint;
select
memname,
put(count(*), 32. -l)
into
:ds_list separated by ' ',
:iter
from dictionary.columns
where
libname="COMPDIR"
and upcase(name)=%upcase("&compCol")
;
quit;
%put &=iter;
%put &=ds_list;
%local compDS;
%do i=1 %to &iter;
%let compDS=%scan(&ds_list,&i);
title1 "Base: &masterDS, Comp: compdir.&compDS";
proc compare
base=&masterDS
comp=compdir.&compDS
novalues
briefsummary
;
var &masterCol;
with &compCol;
id &idVar;
run;
title;
%end;
libname compdir clear;
%mend;
%comp_ds(
masterDS=work.master,
masterCol=colA,
compDir=&other_folder,
compCol=varA,
idVar=idVar
);
Besides of using Proc Compare there are also other ways how this could be done (i.e. using a hash lookup table). What's suitable for your case will depend on the details of what you have (data volumes included) and what you need.
My objective is to check the values of VarA against the values of ColA for each dataset that matches
How are you determining what matches?
In general, get it working for one manual step, show that code and we can help you automate it from there.
Here's some references for you, not sure what level your macro skills are at:
UCLA introductory tutorial on macro variables and macros
https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/
Tutorial on converting a working program to a macro
This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md
Examples of common macro usage
https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...
1) assign a different LIBRARY to each folder or directory if you have duplicated data set names. If you "know beyond a any shadow of a doubt" that data set names are not repeated anywhere you can assign a single library name to multiple folders, check the documentation.
2) You can use DICTIONARY.Columns (proc sql) or sashelp.vcolumn (data step) to get lists of data set library and names that have the variables of interest from the libraries of interest.
Pseudo code:
Proc sql; create table variablelist as select libname, memname, name from dictionary.columns where libname in ('THISLIB' 'THATLIB') and upcase(name) in ('THISVAR' 'THATVAR' 'OTHERVAR') ; quit;
the LIBNAME value stored in the SAS metadata in the dictionary tables is stored in upper case. All of your currently assigned libraries will appear. The Name of the variables are not stored in any standard case so I upcase the variable to compare to the list of variables I may be interested in.
3) Proc Compare will compare explicit variables with others using the VAR and WITH statements.
When using proc compare it is a good idea to sort the data sets by common variables first. This is if you are comparing values by an idea. If you just need to compare the distinct values of any given variable without caring which record has the values you should say so. There are lots of ways to "compare" and the type of comparison is needed for specific suggestions.
As suggested by others first make things work for you for a single use case creating some fully functional static code. Only then amend the code and make it dynamic.
In the following some sample code which hopefully will give you some ideas.
1. Create some sample data
/* create sample Master table to compare against */
data work.master;
set sashelp.class;
idVar=_n_;
colA=name;
run;
/* create sample ds for comparison agains master */
options dlcreatedir;
%let other_folder=%sysfunc(pathname(work))\other_folder;
libname sampleDs "&other_folder";
data sampleDs.ds1
sampleDs.ds2(drop=varA)
sampleDs.ds3;
set master(rename=(colA=varA));
output sampleDs.ds1;
output sampleDs.ds2;
if _n_=3 then varA='XXXX';
output sampleDs.ds3;
run;
2. Code for comparison against a single table. Make this code work for you up to the point where you get exactly what you want.
If you can't be sure that all tables will be pre-sorted by id then also already implement here the required logic to avoid any potential issues with not pre-sorted tables.
/* compare ds agains master */
proc compare
base=work.master
comp=sampleDs.ds3
novalues
briefsummary
;
var colA;
with varA;
id idVar;
run;
3. Only once you've got above 100% working start to make the code dynamic. You can see below how much this complicates the code so you don't want to have to debug issues in dynamic code which you can already resolve in the static code version.
/* compare ds agains master */
%macro comp_ds(
masterDS=work.master,
masterCol=colA,
compDir=&other_folder,
compCol=varA,
idVar=id
);
libname compdir "&other_folder" access=readonly;
%local ds_list iter;
%let ds_list=;
%let iter=0;
proc sql noprint;
select
memname,
put(count(*), 32. -l)
into
:ds_list separated by ' ',
:iter
from dictionary.columns
where
libname="COMPDIR"
and upcase(name)=%upcase("&compCol")
;
quit;
%put &=iter;
%put &=ds_list;
%local compDS;
%do i=1 %to &iter;
%let compDS=%scan(&ds_list,&i);
title1 "Base: &masterDS, Comp: compdir.&compDS";
proc compare
base=&masterDS
comp=compdir.&compDS
novalues
briefsummary
;
var &masterCol;
with &compCol;
id &idVar;
run;
title;
%end;
libname compdir clear;
%mend;
%comp_ds(
masterDS=work.master,
masterCol=colA,
compDir=&other_folder,
compCol=varA,
idVar=idVar
);
Besides of using Proc Compare there are also other ways how this could be done (i.e. using a hash lookup table). What's suitable for your case will depend on the details of what you have (data volumes included) and what you need.
Great that this was so helpful to you.
To answer your questions:
1. put(count(*), 32. -l), this is same as count(*) except it defines it as a long integer with 32 digits?
It WRITES the numerical value as a left aligned string with as many digits as required for the number (which is certainly not 32)
2. Is there a way to use a %do loop outside a %macro definition? like just in a regular data step
No, it isn't. Very recent SAS versions allow you now to write simple %if %then %else statements in open code but not %do loops as far as I recall.
As a rule of thumb: Only use macro code if you can't do it with regular SAS code.
3. In step 3, you used a macro variable for the where statement,
Macro variables in SAS code must be in double quotes to resolve. You would use single quotes if you would want to actually pass the string &temp to the where condition.
where libname='WORK' and upcase(memname) = "&temp";
Thanks,
Patrick
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.