Hi guys,
suppose to have two different sas files A.sas and B.sas with a number of variables. The variables are the same, i.e., they have identical name in A and B. Is there a way to easily check if each variable varies in Type, Len and Format?
I know that I can run proc contents but then I have to visually compare the output and the variables are too many.
Thank you in advance
If it is only every two data set you are interested in I would look into Proc Compare.
proc compare base= work.a (obs=0) compare= work.b (obs=0) novalues listvar briefsummary ; run;
The OBS=0 option says don't bring any values into the comparison so just compares variables and reports differences basically.
IF you are interested in multiple data sets at one time something like the following. The two data sets are to create something to compare so we can show a comparison.
data work.a; input id :$3 var1 var2 ; format var2 f8.; datalines; 123 34 45 ; data work.b; input id var1 var2 ; datalines; 123 34 45 ; Proc tabulate data=sashelp.vcolumn; /* if all the sets are in the ame library, must specify WORK if in the work library where libname ='MYLIB' and memname in ('A' 'B'); */ where libname ='WORK' and memname in ('A' 'B'); /* if sets are in different libraries */ /* where ( libname='THISLIB' and memname='A')*/ /* or */ /* ( libname='THATLIB' and memname='B')*/ /* ;*/ class memname name type length format/missing; format name $upcase32.; table name*(type length format), memname *n /misstext=' '; run;
SASHELP.VCOLUMN holds information about all the variables in your currently defined libraries.
The Memname and Libname variables hold dataset (catalog and other things as well) and library names and are stored in upper case so you need to make sure to match to get the proper members.
The class statement lists other variables that hold information about the variables, not every thing possible. The option MISSING is to make sure all the values are kept as the default behavior of Proc Tabulate is to exclude any observation with missing values for class variables which will happen.
The format $upcase is applied to the variable Name, which holds the variable name, because it can be stored in mixed case which could cause the report to be harder to read.
The first line of the table statement is the row description, the second the columns (table names).
The columns will have a 1 where the value occurs.
As I mentioned, the table approach may be more amenable to multiple tables as the Proc Compare only does 2 sets at a time. With the verbosity of the output it is easy to miss if ONE table out of 5 had one variable different.
You can't run PROC CONTENTS on SAS code in A.sas and B.sas. PROC CONTENTS only works on data sets. Maybe you mean you have to run A. sas and B.sas, which creates data sets, and then you want to compare via PROC CONTENTS. Is that what you mean? Even better would be to run PROC COMPARE.
SAS program files do not have variables. They have CODE.
Let's assume you have two datasets named A and B. You can use PROC CONTENTS and PROC COMPARE to compare their structures.
proc contents data=A noprint out=A_cont; run;
proc contents data=B noprint out=B_cont; run;
proc compare data=A_cont compare=B_cont listall;
id name ;
var type length format formatl formatd label;
run;
If it is only every two data set you are interested in I would look into Proc Compare.
proc compare base= work.a (obs=0) compare= work.b (obs=0) novalues listvar briefsummary ; run;
The OBS=0 option says don't bring any values into the comparison so just compares variables and reports differences basically.
IF you are interested in multiple data sets at one time something like the following. The two data sets are to create something to compare so we can show a comparison.
data work.a; input id :$3 var1 var2 ; format var2 f8.; datalines; 123 34 45 ; data work.b; input id var1 var2 ; datalines; 123 34 45 ; Proc tabulate data=sashelp.vcolumn; /* if all the sets are in the ame library, must specify WORK if in the work library where libname ='MYLIB' and memname in ('A' 'B'); */ where libname ='WORK' and memname in ('A' 'B'); /* if sets are in different libraries */ /* where ( libname='THISLIB' and memname='A')*/ /* or */ /* ( libname='THATLIB' and memname='B')*/ /* ;*/ class memname name type length format/missing; format name $upcase32.; table name*(type length format), memname *n /misstext=' '; run;
SASHELP.VCOLUMN holds information about all the variables in your currently defined libraries.
The Memname and Libname variables hold dataset (catalog and other things as well) and library names and are stored in upper case so you need to make sure to match to get the proper members.
The class statement lists other variables that hold information about the variables, not every thing possible. The option MISSING is to make sure all the values are kept as the default behavior of Proc Tabulate is to exclude any observation with missing values for class variables which will happen.
The format $upcase is applied to the variable Name, which holds the variable name, because it can be stored in mixed case which could cause the report to be harder to read.
The first line of the table statement is the row description, the second the columns (table names).
The columns will have a 1 where the value occurs.
As I mentioned, the table approach may be more amenable to multiple tables as the Proc Compare only does 2 sets at a time. With the verbosity of the output it is easy to miss if ONE table out of 5 had one variable different.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.