I have two directories. mylibA and mylibB in which some datasets names are the same and some are close but not exactly (have v_ before the dataset name). The names of the dataset looks like following:
------------------
mylibA
-----------------
want1
want2
want3
want4
-------------------
mylibB
------------------
want1
v_want2
want3
v_want4
Is there a way to use PROC COMPARE to loop through and automatically compare all these datasets both libraries: i.e, mylibA.want1 and mylibB.want1, mylibA.want2 and mylibB.v_want2, mylibA.want3 and mylibB.want3, mylibA.want4 and mylib.v_want4.
That is, v_want2 and v_want4 should actually by want2 and want4, respectively.
Thanks!
Something like this?
proc sql;
create table TABLES as
select LIBNAME, MEMNAME, left(tranwrd(MEMNAME,'V_',' ')) as KEY
from DICTIONARY.TABLES
where LIBNAME in ('MYLIB1','MYLIB2')
and MEMNAME like '%WANT%' ;
quit;
data _null_;
merge TABLES (in=A where=(LIBNAME='MYLIB1') rename=(MEMNAME=M1))
TABLES (in=B where=(LIBNAME='MYLIB2') rename=(MEMNAME=M2));
by KEY;
if A and B;
call execute('proc compare base=MYLIB1.'||M1||' data=MYLIB2.'||M2||';run;');
run
What do you really want to know from the comparison?
What is the final result of your process? "That is, v_want2 and v_want4 should actually by want2 and want4, respectively." isn't very clear. Do you intend to Rename the data sets to standard version? If that is the case then you may be contemplating a lot of extra work if the pattern is actual V_ prefixing a name. Or do you have a bunch of different prefixes that vary in no regular pattern? (Hint: regular patterns you program for relatively easily).
You can get the names of the data sets by selecting records from the view SASHELP.Vtable, or using Proc Sql Dictionary.tables. An example of getting all the table names for one library.
Proc sql; create table Libone as select libname, memname from dictionary.tables where libname='MYLIB1' ; run;
I'm not going to go any further with code as there a number of ways to interpret your request and some of them might require more thinking than I'm willing to do on a Saturday.
Once you have the text involved there are several functions to compare text variables such as SOUNDEX, COMPLEV and COMPGED
Thanks!@ballardw.
The goal is to compare datasets in both directories.
A good program should look like the following:
proc compare
base=MYLIB1.want1
compare=MYLIB2.want1
out=MYLIB3.want1
run;
proc compare
base=MYLIB1.want2
compare=MYLIB2.v_want2
out=MYLIB3.want2
run;
etc
If the dataset were the same name in both libraries, this can be done with something like this:
do i=1 to &n;
proc compare
base=MYLIB1.&ds&i
compare=MYLIB2.&ds&i
out=MYLIB3.&ds&i
run;
end;
where &ds and &n are macro variables from DATA _NULL_ that reference the different datasets.
But the challenge here is that for some datasets in the second directory, the dataset names begins with the prefix v_ (no other prefix) and some are the same name in both directories.
@SWEETSAS wrote:
Thanks!@ballardw.
If the dataset were the same name in both libraries, this can be done with something like this:
do i=1 to &n;
proc compare
base=MYLIB1.&ds&i
compare=MYLIB2.&ds&i
out=MYLIB3.&ds&i
run;
end;
You can't run a Procedure within a SAS datastep so your code would not work. You could though achieve what you want to do by having the Proc Compare in a macro and then use call execute() within a do loop.
If the potential difference between DS names is only the V_ then it wouldn't be hard to create a list of the DS to be compared by querying dictionary.tables BUT: The V_... is often a naming convention for Views. Are you 100% sure these v_have1 "tables" are not just views of the have1 tables? If so a Proc Compare wouldn't make any sense for such cases.
Here some sample code how this could look like. Amend as required for your actual use case.
options dlcreatedir;
libname lib1 "%sysfunc(pathname(work))\lib1";
libname lib2 "%sysfunc(pathname(work))\lib2";
data lib1.want1 lib1.want2 lib1.want3;
set sashelp.class;
run;
data lib2.want1;
set lib1.want1;
run;
data lib2.v_want2;
set lib1.want1;
if _n_=5 then call missing(sex);
run;
proc sql;
create view lib2.v_want3 as
select * from lib1.want3
;
quit;
proc sql;
create table comp_tables as
select
l.libname as lib1,
l.memname as ds1,
r.libname as lib2,
r.memname as ds2
from
dictionary.tables l
inner join
dictionary.tables r
on
l.libname='LIB1' and r.libname='LIB2'
and l.memtype='DATA' and r.memtype='DATA'
and
(
l.memname=r.memname
or
substrn(r.memname,1,2)='V_' and l.memname=substrn(r.memname,3)
)
;
quit;
%macro compare(lib1,ds1,lib2,ds2);
proc compare data=&lib1..&ds1 comp=&lib2..&ds2;
run;
%mend;
data _null_;
set comp_tables;
length cmd $100;
cmd=cats( '%compare(',catx(',',lib1,ds1,lib2,ds2),')' );
call execute(cmd);
run;
Thanks @Patrick !
Yes! The difference between DS names is only the V_.
@SWEETSAS Then the code I've posted should work. ...and it will exclude any views.
Thanks for this program. I have some few questions.
libname lib1 "%sysfunc(pathname(work))\lib1";
libname lib2 "%sysfunc(pathname(work))\lib2";
What is the above program doing? Creating two libraries under WORK?
What is the last SQL procedure meant to do? Place all datasets in LIB1 into ds1 and all datasets in LIB2 into a ds2? And this is performed for only datasets that exit in both libraries?
I ask because I did not see a counter that loops through the datasets in the COMPARE procedure.
Thanks!
You didn't provide sample data so I had to create it.
Yes, the libname statements are just creating two sub-folders under work. This to create a fully working self-contained sample program that mimics what you describe.
Everything up to the last SQL proc is just for creation of sample data.
The last SQL is then creating a table with a row per DS you want to compare. And the data _null_ step is then calling macro %compare() once per row in this table and though generating the code required to compare your tables.
call execute() is "stacking" up code - so first the data _null_ step processes the source table (created in the SQL, have a look how this table looks like), call execute creates one macro call per row in this source table (and there we pass in the variable in this source table as the parameters to the macro) ...and once the data step finished iterating through the source table all the generated code (the macro calls) get executed.
If you run the code I've posted you can see in the SAS log how these call execute() generated macro call syntax looks like.
Something like this?
proc sql;
create table TABLES as
select LIBNAME, MEMNAME, left(tranwrd(MEMNAME,'V_',' ')) as KEY
from DICTIONARY.TABLES
where LIBNAME in ('MYLIB1','MYLIB2')
and MEMNAME like '%WANT%' ;
quit;
data _null_;
merge TABLES (in=A where=(LIBNAME='MYLIB1') rename=(MEMNAME=M1))
TABLES (in=B where=(LIBNAME='MYLIB2') rename=(MEMNAME=M2));
by KEY;
if A and B;
call execute('proc compare base=MYLIB1.'||M1||' data=MYLIB2.'||M2||';run;');
run
thanks @ChrisNZ
Thanks! thanks !! Thanks!!!. The code works. The code is able to elegantly sieve through the two libraries and smartly pick up the correct datasets to compare in both libraries and does the desired comparison, the V_ prefix not withstanding. Amazing!!!!
Please, I want to output the results of the comparison into corresponding datasets in another library (LIB3) using OUT= in PROC COMAPRE. The dataset name in the OUT= should be the same as captured in variable KEY (i.e., the dataset names without the V_). That is, for MYLIB1.want1, the results of the comparison will be sent to MYLIB3.want1. For MYLIB1.want2, the results of comparison will be outputted into MLIB3.want2, etc. I also want to include ID statement in PROC COMPARE. Below is a macro that shows what I am trying to achieve. I sincerely thank you for your help!!
/*Data steps to assign counter "num" and data set names "d"*/
%macro compare;
do i=1 to #
proc compare base=LIB1..&&d&i compare=LIB2..&d&i out=LIB3..&&d&i;
id id;
run;
%mend compare;
Just add the required changes to the call execute. Like:
call execute('proc compare base=MYLIB1.'||M1||' data=MYLIB2.'||M2||' out=LIB3.'||KEY||'; id ID; run;');
Or
call execute(catt('proc compare base=MYLIB1.', M1, ' data=MYLIB2.', M2, ' out=LIB3.', KEY, '; id ID; run;'));
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.