About comparing data set schemas

Reply
Occasional Contributor
Posts: 8

About comparing data set schemas

Dear All,

 

I found the following sas code on line for comparing data set schemas. There are always errors regarding selecting variables from data files when I revised the code to use my data. My reference data file " happy"  is saved in "c:\ab" with a libname ab, and my input data "happy1" is saved in "c:\xy" with a libname xy.

 

libname ab "c:\ab";
libname xy "c:\xy";

 

I replaced the ref_lib with ab, the ref_table with happy, in_lib with xy, and in_table with happy1. However, it did not work. My understanding is that the locations of the reference is just hte libname. Am I wrong? Thanks a lot.

 

Ling

 

/* Macro to check attributes of input data against reference   */
/* data, and create a ranked list of data integrity exceptions */
%macro check_reference_data(
	ref_lib 	/* location of reference data */,
	ref_table 	/* name of reference table for comparison */,
	in_lib 		/* location of input data to compare */,
	in_table 	/* name of input table to compare */,
	out_result 	/* new data set to hold result */
);
proc sql;

  create view WORK._TMPIN as
  select * from &in_lib..&in_table.;

  create table &out_result. as
    select 
      "&in_table." as DATASET,
	  "&ref_table." as REFERENCE,
      a.name as ref_name, b.name, 
      a.length as ref_length, b.length,
      a.type as ref_type, b.type,
      a.format as ref_format, b.format,
      case 
        when b.name is missing then 
			'SEV-1: MISSING COLUMN'
        when a.name is missing then 
			'SEV-4: EXTRA COLUMN'
        when b.type <> a.type then 
			'SEV-1: MISMATCHED TYPE'
        when b.length > a.length then 
			'SEV-2: MISMATCHED LENGTH, POSSIBLE TRUNCATION'
        when b.format <> a.format then 
			'SEV-3: MISMATCHED FORMAT, POSSIBLE MISINTERPRETATION'       
        else 
			'OK'
      end as RULE
    from 
      (select name, length, type, format from sashelp.vcolumn 
        where libname="&ref_lib" and memname="&ref_table") a FULL JOIN
      (select name, length, type, format from sashelp.vcolumn 
          where libname="WORK" and memname="_TMPIN") b on (upcase(a.name)=upcase(b.name))
   order by RULE;

   drop view work._tmpIn;
quit;
%mend;  /* check_reference_data */

 

Super User
Posts: 13,286

Re: About comparing data set schemas

You should show the code you use to call the macro. If you get one of the items in the wrong order that could cause lots of errors. Next, whoever wrote this macro didn't document that your ref_lib and ref_table values have to be in uppercase as that is how they are stored in sashelp.vcolumn (or you could modify the code to avoid this)

where libname="&ref_lib" and memname="&ref_table"
to 
where libname=upcase("&ref_lib") and memname=upcase("&ref_table"))

 

 

When you get errors it is best, when using macros, to run the macro with the options mprint symbolgen and then post the result from the log into a code box opened using the forum {i} menu icon. The code box will not reformat the text from the log as the main message window will which moves some of the indicators for error messages.

 

Also "Doesn't work" is awful vague.

Are there errors in the log?: Post the code and log in a code box opened with the {i} to maintain formatting of error messages.

No output? Post any log in a code box.

Unexpected output? Provide input data in the form of a dataset, the actual results and the expected results. Data should be in the form of a data step. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

Occasional Contributor
Posts: 8

Re: About comparing data set schemas

Dear Ballardw,

 

       Thanks a lot for your help. The following  is the log section that I got by using the code that I found on line.

       Best wishes,

 

Ling

 

481  libname yl "Z:\Yanling\Test_Data";
NOTE: Libref YL was successfully assigned as follows:
      Engine:        V9
      Physical Name: Z:\Yanling\Test_Data
482  libname ab "C:\Tasks\My tasks\13 week 9.6.17";
NOTE: Libref AB was successfully assigned as follows:
      Engine:        V9
      Physical Name: C:\Tasks\My tasks\13 week 9.6.17
483  %macro check_reference_data(
484      ab  /* location of reference data */,
485      out_adc20134    /* name of reference table for comparison */,
486      yl  /* location of input data to compare */,
487      out_adc20141    /* name of input table to compare */,
488      out_result  /* new data set to hold result */
489  );
490  proc sql;
491
492    create view WORK._TMPIN as
493    select * from &yl..&out_adc20141.;
494    create table &out_result. as
495      select
496        "&out_adc20141." as DATASET,
497        "&out_adc20134." as REFERENCE,
498        a.name as ref_name, b.name,
499        a.length as ref_length, b.length,
500        a.type as ref_type, b.type,
501        a.format as ref_format, b.format,
502        case
503          when b.name is missing then
504              'SEV-1: MISSING COLUMN'
505          when a.name is missing then
506              'SEV-4: EXTRA COLUMN'
507          when b.type <> a.type then
508              'SEV-1: MISMATCHED TYPE'
509          when b.length > a.length then
510              'SEV-2: MISMATCHED LENGTH, POSSIBLE TRUNCATION'
511          when b.format <> a.format then
512              'SEV-3: MISMATCHED FORMAT, POSSIBLE MISINTERPRETATION'
513          else
514              'OK'
515        end as RULE
516      from
517        (select name, length, type, format from sashelp.vcolumn
518  /*where libname=upcase(("&ab") and memname=upcase("&out_adc20134"))*/
519
520          where libname="&ab" and memname="&out_adc20134") a FULL JOIN
521        (select name, length, type, format from sashelp.vcolumn
522            where libname="WORK" and memname="_TMPIN") b on (upcase(a.name)=upcase(b.name))
523     order by RULE;
524
525     drop view work._tmpIn;
526  quit;
527  %mend;  /* check_reference_data */
528
529  %check_reference_data(sashelp,out_adc20134,sashelp,out_adc20141,work.check);
NOTE: SQL view WORK._TMPIN has been defined.
NOTE: The "<>" operator is interpreted as "not equals".
ERROR: File SASHELP.OUT_ADC20141.DATA does not exist.
NOTE: Table WORK.CHECK created, with 0 rows and 11 columns.

NOTE: View WORK._TMPIN has been dropped.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.10 seconds
      cpu time            0.06 seconds


530  title "Does ADC20141 have everything that ADC20134 has?  Yes, and more";
531  proc print data=work.check;
532  run;

NOTE: No observations in data set WORK.CHECK.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


 

Occasional Contributor
Posts: 8

Re: About comparing data set schemas

Dear Ballardw,

 

             Thanks again for your help. Here is the log section that I got after revising the code based on your suggestions.

             Best wishes,

 

Ling

 

533  libname yl "Z:\Yanling\Test_Data";
NOTE: Libref YL was successfully assigned as follows:
      Engine:        V9
      Physical Name: Z:\Yanling\Test_Data
534  libname ab "C:\Tasks\My tasks\13 week 9.6.17";
NOTE: Libref AB was successfully assigned as follows:
      Engine:        V9
      Physical Name: C:\Tasks\My tasks\13 week 9.6.17
535  %macro check_reference_data(
536      ab  /* location of reference data */,
537      out_adc20134    /* name of reference table for comparison */,
538      yl  /* location of input data to compare */,
539      out_adc20141    /* name of input table to compare */,
540      out_result  /* new data set to hold result */
541  );
542  proc sql;
543
544    create view WORK._TMPIN as
545    select * from &yl..&out_adc20141.;
546    create table &out_result. as
547      select
548        "&out_adc20141." as DATASET,
549        "&out_adc20134." as REFERENCE,
550        a.name as ref_name, b.name,
551        a.length as ref_length, b.length,
552        a.type as ref_type, b.type,
553        a.format as ref_format, b.format,
554        case
555          when b.name is missing then
556              'SEV-1: MISSING COLUMN'
557          when a.name is missing then
558              'SEV-4: EXTRA COLUMN'
559          when b.type <> a.type then
560              'SEV-1: MISMATCHED TYPE'
561          when b.length > a.length then
562              'SEV-2: MISMATCHED LENGTH, POSSIBLE TRUNCATION'
563          when b.format <> a.format then
564              'SEV-3: MISMATCHED FORMAT, POSSIBLE MISINTERPRETATION'
565          else
566              'OK'
567        end as RULE
568      from
569        (select name, length, type, format from sashelp.vcolumn
570  where libname=upcase(("&ab") and memname=upcase("&out_adc20134"))
571
572         /* where libname="&ab" and memname="&out_adc20134")*/a FULL JOIN
573        (select name, length, type, format from sashelp.vcolumn
574            where libname="WORK" and memname="_TMPIN") b on (upcase(a.name)=upcase(b.name))
575     order by RULE;
576
577     drop view work._tmpIn;
578  quit;
579  %mend;  /* check_reference_data */
580
581  %check_reference_data(sashelp,out_adc20134,sashelp,out_adc20141,work.check);
NOTE: SQL view WORK._TMPIN has been defined.
NOTE: Line generated by the invoked macro "CHECK_REFERENCE_DATA".
7             a FULL JOIN       (select name, length, type, format from sashelp.vcolumn           where libname="WORK" and
              -
              79
7  ! memname="_TMPIN") b on (upcase(a.name)=upcase(b.name))    order by RULE;     drop view work._tmpIn; quit;
NOTE: The "<>" operator is interpreted as "not equals".
ERROR 79-322: Expecting a ).
NOTE: View WORK._TMPIN has been dropped.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds

582  title "Does ADC20141 have everything that ADC20134 has?  Yes, and more";


583  proc print data=work.check;
584  run;

NOTE: No observations in data set WORK.CHECK.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


Super User
Posts: 13,286

Re: About comparing data set schemas

from
/*This ( does not have a match => */ (select name, length, type, format from sashelp.vcolumn
 where libname=upcase(("&ab") and memname=upcase("&out_adc20134"))
/* insert a ) before the a*/
        ) a FULL JOIN
       (select name, length, type, format from sashelp.vcolumn
           where libname="WORK" and memname="_TMPIN") b on (upcase(a.name)=upcase(b.name))
    order by RULE;

starting at line 568

 

Occasional Contributor
Posts: 8

Re: About comparing data set schemas

 

821
822
823  /*Comparing data set schemas in SAS*/
824  /*Selecting Data from More than One Table by Using Joins*/
825
826
827  *libname ab "c:\ab";
828  *libname xy "c:\xy";
829
830
831
832  /*proc contents data=yl.out_adc20141;run;
833  proc contents data=ab.out_adc20134;run;
834  proc contents data=YL.out_adc20181;run;*/
835
836  /* Macro to check attributes of input data against reference   */
837  /* data, and create a ranked list of data integrity exceptions */
838
839  libname yl "Z:\Yanling\Test_Data";
NOTE: Libref YL was successfully assigned as follows:
      Engine:        V9
      Physical Name: Z:\Yanling\Test_Data
840  libname ab "C:\Tasks\My tasks\13 week 9.6.17";
NOTE: Libref AB was successfully assigned as follows:
      Engine:        V9
      Physical Name: C:\Tasks\My tasks\13 week 9.6.17
841  %macro check_reference_data(
842      ab  /* location of reference data */,
843      out_adc20134    /* name of reference table for comparison */,
844      yl  /* location of input data to compare */,
845      out_adc20141    /* name of input table to compare */,
846      out_result  /* new data set to hold result */
847  );
848  proc sql;
849
850    create view WORK._TMPIN as
851    select * from &yl..&out_adc20141.;
852    create table &out_result. as
853      select
854        "&out_adc20141." as DATASET,
855        "&out_adc20134." as REFERENCE,
856        a.name as ref_name, b.name,
857        a.length as ref_length, b.length,
858        a.type as ref_type, b.type,
859        a.format as ref_format, b.format,
860        case
861          when b.name is missing then
862              'SEV-1: MISSING COLUMN'
863          when a.name is missing then
864              'SEV-4: EXTRA COLUMN'
865          when b.type <> a.type then
866              'SEV-1: MISMATCHED TYPE'
867          when b.length > a.length then
868              'SEV-2: MISMATCHED LENGTH, POSSIBLE TRUNCATION'
869          when b.format <> a.format then
870              'SEV-3: MISMATCHED FORMAT, POSSIBLE MISINTERPRETATION'
871          else
872              'OK'
873        end as RULE
874      from
875        ((select name, length, type, format from sashelp.vcolumn
876  where libname=upcase("&ab") and memname=upcase("&out_adc20134"))
877
878          /*where libname="&ab" and memname="&out_adc20134"*/)a FULL JOIN
879        (select name, length, type, format from sashelp.vcolumn
880            where libname="WORK" and memname="_TMPIN") b on (upcase(a.name)=upcase(b.name))
881     order by RULE;
882
883     drop view work._tmpIn;
884  quit;
885  %mend;  /* check_reference_data */
886
887  %check_reference_data(sashelp,out_adc20134,sashelp,out_adc20141,work.check);
NOTE: SQL view WORK._TMPIN has been defined.
NOTE: The "<>" operator is interpreted as "not equals".
ERROR: File SASHELP.OUT_ADC20141.DATA does not exist.NOTE: Table WORK.CHECK created, with 0 rows and 11 columns.

NOTE: View WORK._TMPIN has been dropped.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.04 seconds
      cpu time            0.04 seconds


888  title "Does ADC20141 have everything that ADC20134 has?  Yes, and more";
889  proc print data=work.check;
890  run;

NOTE: No observations in data set WORK.CHECK.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


 

 

Occasional Contributor
Posts: 8

Re: About comparing data set schemas

Dear Ballardw,

     Appreciate your help. I still got the above error after revising the code based on your suggestions, but I am making progress. Thanks!

     Best wishes,

 

Ling

Super User
Super User
Posts: 7,929

Re: About comparing data set schemas

The error message seems pretty straight forward.  You asked SAS to read from a dataset named 

887  %check_reference_data(sashelp,out_adc20134,sashelp,out_adc20141,work.check);
NOTE: SQL view WORK._TMPIN has been defined.
NOTE: The "<>" operator is interpreted as "not equals".
ERROR: File SASHELP.OUT_ADC20141.DATA 

I find that it is clearer if you call the macro using the parameter names rather than depending on the position of the values in the macro call.  SAS is happy to let you do that for any parameter, even those like yours that defined as postional.  You just can't try make the call using positional arguments when the macro has defined them as named parameters.  

 

So you called the macro with these values for the parameters. (It would really help if you used meaningful names for your macro parameters!)

%check_reference_data
(ab=sashelp
,out_adc20134=out_adc20134
,yl=sashelp
,out_adc20141=out_adc20141
,out_result=work.check
)

And the macro defined a view to read from this dataset

 &yl..&out_adc20141.

So based on your parameter values that would translate into:

sashelp.out_adc20141

Which the error message is saying doesn't exist.  

You need to tell the macro the name of a dataset that does exist.

 

Occasional Contributor
Posts: 8

Re: About comparing data set schemas

Dear Tom,

 

        Thanks a lot for your suggestions and your explaination. That is very helpful. However, I still got the following error after using the code that you offered. I also checked the temporary data files that should be created, but neither sashelp.out_adc20141 nor the sashelp.out_adc20134 was found. I also found that  the data file work.check has no observations. I am wondering whether the code created the file of out_result, I did not find it in the the library of sashelp. I am not positive.

        Best wishes,

 

1195
1196
1197  /*Comparing data set schemas in SAS*/
1198  /*Selecting Data from More than One Table by Using Joins*/
1199
1200
1201  *libname ab "c:\ab";
1202  *libname xy "c:\xy";
1203
1204
1205
1206  /*proc contents data=yl.out_adc20141;run;
1207  proc contents data=ab.out_adc20134;run;
1208  proc contents data=YL.out_adc20181;run;*/
1209
1210  /* Macro to check attributes of input data against reference   */
1211  /* data, and create a ranked list of data integrity exceptions */
1212
1213  libname yl "Z:\Yanling\Test_Data";
NOTE: Libref YL was successfully assigned as follows:
      Engine:        V9
      Physical Name: Z:\Yanling\Test_Data
1214  libname ab "C:\Tasks\My tasks\13 week 9.6.17";
NOTE: Libref AB was successfully assigned as follows:
      Engine:        V9
      Physical Name: C:\Tasks\My tasks\13 week 9.6.17
1215  %macro check_reference_data(
1216      ab  /* location of reference data */,
1217      out_adc20134    /* name of reference table for comparison */,
1218      yl  /* location of input data to compare */,
1219      out_adc20141    /* name of input table to compare */,
1220      out_result  /* new data set to hold result */
1221  );
1222  proc sql;
1223
1224    create view WORK._TMPIN as
1225    select * from &yl..&out_adc20141.;
1226    create table &out_result. as
1227      select
1228        "&out_adc20141." as DATASET,
1229        "&out_adc20134." as REFERENCE,
1230        a.name as ref_name, b.name,
1231        a.length as ref_length, b.length,
1232        a.type as ref_type, b.type,
1233        a.format as ref_format, b.format,
1234        case
1235          when b.name is missing then
1236              'SEV-1: MISSING COLUMN'
1237          when a.name is missing then
1238              'SEV-4: EXTRA COLUMN'
1239          when b.type <> a.type then
1240              'SEV-1: MISMATCHED TYPE'
1241          when b.length > a.length then
1242              'SEV-2: MISMATCHED LENGTH, POSSIBLE TRUNCATION'
1243          when b.format <> a.format then
1244              'SEV-3: MISMATCHED FORMAT, POSSIBLE MISINTERPRETATION'
1245          else
1246              'OK'
1247        end as RULE
1248      from
1249        ((select name, length, type, format from sashelp.vcolumn
1250  where libname=upcase("&ab") and memname=upcase("&out_adc20134"))
1251
1252          /*where libname="&ab" and memname="&out_adc20134"*/)a FULL JOIN
1253        (select name, length, type, format from sashelp.vcolumn
1254            where libname="WORK" and memname="_TMPIN") b on (upcase(a.name)=upcase(b.name))
1255     order by RULE;
1256
1257     drop view work._tmpIn;
1258  quit;
1259  %mend;  /* check_reference_data */
1260
1261  /*%check_reference_data(sashelp,out_adc20134,sashelp,out_adc20141,work.check);
1262  title "Does ADC20141 have everything that ADC20134 has?  Yes, and more";
1263  proc print data=work.check;
1264  run;*/
1265
1266
1267
1268  %check_reference_data
1269  (ab=sashelp
1270  ,out_adc20134=out_adc20134
1271  ,yl=sashelp
1272  ,out_adc20141=out_adc20141
1273  ,out_result=work.check
1274  )
NOTE: SQL view WORK._TMPIN has been defined.
NOTE: The "<>" operator is interpreted as "not equals".
ERROR: File SASHELP.OUT_ADC20141.DATA does not exist.
NOTE: Table WORK.CHECK created, with 0 rows and 11 columns.

NOTE: View WORK._TMPIN has been dropped.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds


1275  proc print data=work.check;
1276  run;

NOTE: No observations in data set WORK.CHECK.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


Super User
Super User
Posts: 7,929

Re: About comparing data set schemas

[ Edited ]

Why do you keep asking it to look in the SASHELP library instead of the YL or AB library that made with your LIBNAME statements?

 

%check_reference_data
(ab=  AB
,out_adc20134=out_adc20134
,yl= YL
,out_adc20141=out_adc20141
,out_result=work.check
)
Respected Advisor
Posts: 4,665

Re: About comparing data set schemas

@yanlima

You're calling the macro wrongly not passing in the parameters as expected by the macro.

Not sure why you want to use this macro at all. There is an ootb SAS Procedure for such tasks (Proc Compare) which gives you a lot of options for the type of comparisons and reports you want to create.

 

/* Macro to check attributes of input data against reference   */
/* data, and create a ranked list of data integrity exceptions */
%macro check_reference_data(
	ref_lib 	/* location of reference data */,
	ref_table 	/* name of reference table for comparison */,
	in_lib 		/* location of input data to compare */,
	in_table 	/* name of input table to compare */,
	out_result 	/* new data set to hold result */
);
proc sql;

  create view WORK._TMPIN as
  select * from &in_lib..&in_table.;

  create table &out_result. as
    select 
      "&in_table." as DATASET,
	  "&ref_table." as REFERENCE,
      a.name as ref_name, b.name, 
      a.length as ref_length, b.length,
      a.type as ref_type, b.type,
      a.format as ref_format, b.format,
      case 
        when b.name is missing then 
			'SEV-1: MISSING COLUMN'
        when a.name is missing then 
			'SEV-4: EXTRA COLUMN'
        when b.type <> a.type then 
			'SEV-1: MISMATCHED TYPE'
        when b.length > a.length then 
			'SEV-2: MISMATCHED LENGTH, POSSIBLE TRUNCATION'
        when b.format <> a.format then 
			'SEV-3: MISMATCHED FORMAT, POSSIBLE MISINTERPRETATION'       
        else 
			'OK'
      end as RULE
    from 
      (select name, length, type, format from sashelp.vcolumn 
        where libname="&ref_lib" and memname="&ref_table") a FULL JOIN
      (select name, length, type, format from sashelp.vcolumn 
          where libname="WORK" and memname="_TMPIN") b on (upcase(a.name)=upcase(b.name))
   order by RULE;

   drop view work._tmpIn;
quit;
%mend;  /* check_reference_data */

/**** create sample data *****/
options dlcreatedir;
libname ab "c:\temp\f1";
libname xy "c:\temp\f2";

data ab.ds1;
  set sashelp.class(drop=sex);
  attrib otherVar length=$10 format=$10. label='ds1 other var';
  otherVar='ds1';
  otherVar2='ds1';
run;
data xy.ds2;
  set sashelp.class(drop=name);
  attrib otherVar informat=best32. format=comma10.2 label='ds2 other var';
  otherVar=_n_;
  otherVar2='ds2';
run;

/**** call macro using sample data *****/
%check_reference_data(
    ref_lib=ab
  , ref_table=ds1
  , in_lib=xy
  , in_table=ds2
  , out_result=work.check
  )

/**** use Proc Compare with same sample data *****/
proc compare base=ab.ds1 compare=xy.ds2 noobs;
run;
Super User
Super User
Posts: 7,929

Re: About comparing data set schemas

Your macro looks pretty good. You could reformat it a little make it easier to read and re-paramatize it to make it a little easier to use.

For example not really any need to pass the reference table name or the table to be check using two parameters instead of one.  

The macro already doesn't need this for the table to be checked since it makes a view and referenes the name of the view when checking the metadata.  For the reference table you could either do the same thing or just parse it yourself into libref and dsname parts.

%macro check_reference_data
/* Macro to check attributes of input data against reference   */
/* data, and create a ranked list of data integrity exceptions */
(in_table           /* name of input table to compare */
,ref_table          /* name of reference table for comparison */
,out_result         /* new data set to hold result */
);
proc sql;

%* Create VIEWs to allow use of dataset options ;
  create view work._tmpin as select * from &in_table ;
  create view work._tmpref as select * from &ref_table ;

  create table &out_result. as
    select "&in_table." as DATASET
         , "&ref_table." as REFERENCE
         , a.name as ref_name
         , b.name
         , a.length as ref_length
         , b.length
         , a.type as ref_type
         , b.type
         , a.format as ref_format
         , b.format
         , case
             when b.name is missing then
                 'SEV-1: MISSING COLUMN'
             when a.name is missing then
                 'SEV-4: EXTRA COLUMN'
             when b.type ne a.type then
                 'SEV-1: MISMATCHED TYPE'
             when b.length > a.length then
                 'SEV-2: MISMATCHED LENGTH, POSSIBLE TRUNCATION'
             when b.format ne a.format then
                 'SEV-3: MISMATCHED FORMAT, POSSIBLE MISINTERPRETATION'
             else
                 'OK'
           end as RULE
    from
      (select name, length, type, format from sashelp.vcolumn
        where libname="WORK" and memname="_TMPREF") a
    full join
      (select name, length, type, format from sashelp.vcolumn
          where libname="WORK" and memname="_TMPIN") b
    on (upcase(a.name)=upcase(b.name))
    order by RULE
   ;

   drop view work._tmpref;
   drop view work._tmpref;
quit;
%mend check_reference_data ;

You could use this call to test if it can tell when a variable is not there.

%check_reference_data
(in_table =sashelp.class(drop=age)  	/* name of input table to compare */
,ref_table=sashelp.class 	 /* name of reference table for comparison */
,out_result=test1 	/* new data set to hold result */
);
Occasional Contributor
Posts: 8

Re: About comparing data set schemas

Dear Tom,

         Thanks a lot for your help. Sorry for my late reply. I tried your code, it works for using the sample data(See the attached file, the variable names are "name, sex, age, height, and weight"). The thing is that I am not able to replace the sample data with my data. In the following code, I named the input table as "out_adc20141" and the reference table as "out_adc20134". However, no matter what I call them, the code will compare the sample data instead of my data.

         I am asking a very stupid question here, but I really want to know. In you code, I did not find a place to mention where the data comes from. How could SAS know that you want to compare the sample data? Thanks again.

 

%macro check_reference_data
/* Macro to check attributes of input data against reference   */
/* data, and create a ranked list of data integrity exceptions */
(out_adc20141           /* name of input table to compare */
,out_adc20134        /* name of reference table for comparison */
,out_result         /* new data set to hold result */
);
proc sql;

%* Create VIEWs to allow use of dataset options ;
  create view work._tmpin as select * from &out_adc20141;
  create view work._tmpref as select * from &out_adc20134;

  create table &out_result. as
    select "&out_adc20141." as DATASET
         , "&out_adc20134." as REFERENCE
         , a.name as ref_name
         , b.name
         , a.length as ref_length
         , b.length
         , a.type as ref_type
         , b.type
         , a.format as ref_format
         , b.format
         , case
             when b.name is missing then
                 'SEV-1: MISSING COLUMN'
             when a.name is missing then
                 'SEV-4: EXTRA COLUMN'
             when b.type ne a.type then
                 'SEV-1: MISMATCHED TYPE'
             when b.length > a.length then
                 'SEV-2: MISMATCHED LENGTH, POSSIBLE TRUNCATION'
             when b.format ne a.format then
                 'SEV-3: MISMATCHED FORMAT, POSSIBLE MISINTERPRETATION'
             else
                 'OK'
           end as RULE
    from
      (select name, length, type, format from sashelp.vcolumn
        where libname="WORK" and memname="_TMPREF") a
    full join
      (select name, length, type, format from sashelp.vcolumn
          where libname="WORK" and memname="_TMPIN") b
    on (upcase(a.name)=upcase(b.name))
    order by RULE
   ;

   *drop view work._tmpref;
  * drop view work._tmpref;
quit;
%mend check_reference_data ;


%check_reference_data
(out_adc20141 =sashelp.class(drop=age)  	/* name of input table to compare */
,out_adc20134=sashelp.class 	 /* name of reference table for comparison */
,out_result=test1 	/* new data set to hold result */
);
Super User
Super User
Posts: 7,929

Re: About comparing data set schemas

In the macro call at the end of the code in this most recent posting you are still calling the macro with SASHELP.CLASS  as the name of the table you want for both the input and reference table.

 

Do not change the NAME of the parameters in the macro DEFINITION.  The macro parameters are just the macro variable names that will be used as the place holders for the values that you want to use when you actually run the macro. You supply the values that you want the macro to use when you CALL or invoke the macro.  

 

Using OUT_ADC20141 and OUT_ADC20134 as the parameter names is not very meaningful. Something like INPUT_TABLE and REFERENCE_TABLE would make much more sense and make the macro code much easier to read.

Occasional Contributor
Posts: 8

Re: About comparing data set schemas

Dear Tom, 

 

         Thanks a lot for your quick reply. I got the following error in the log section if I use the code that you provide. That is, use the name of "INPUT_TABLE and REFERENCE_TABLE." It seems that I did not specify which data I want to compare. Am I right? Anyway, how could I use my data to compare?Thanks again. 

         Best wishes,

Ling

 

 

NOTE: Copyright (c) 2002-2012 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software 9.4 (TS1M2)
      Licensed to BD OF REGENTS OF UNIV SYST OF GEORGIA - SFA T&R, Site 70080521.
NOTE: This session is executing on the X64_7PRO  platform.



NOTE: Updated analytical products:

      SAS/STAT 13.2
      SAS/ETS 13.2
      SAS/OR 13.2
      SAS/IML 13.2
      SAS/QC 13.2

NOTE: Additional host information:

 X64_7PRO WIN 6.1.7601 Service Pack 1 Workstation

NOTE: SAS initialization used:
      real time           1.15 seconds
      cpu time            0.92 seconds

1    %macro check_reference_data
2    /* Macro to check attributes of input data against reference   */
3    /* data, and create a ranked list of data integrity exceptions */
4    (in_table           /* name of input table to compare */
5    ,ref_table          /* name of reference table for comparison */
6    ,out_result         /* new data set to hold result */
7    );
8    proc sql;
9
10   %* Create VIEWs to allow use of dataset options ;
11     create view work._tmpin as select * from &in_table ;
12     create view work._tmpref as select * from &ref_table ;
13
14     create table &out_result. as
15       select "&in_table." as DATASET
16            , "&ref_table." as REFERENCE
17            , a.name as ref_name
18            , b.name
19            , a.length as ref_length
20            , b.length
21            , a.type as ref_type
22            , b.type
23            , a.format as ref_format
24            , b.format
25            , case
26                when b.name is missing then
27                    'SEV-1: MISSING COLUMN'
28                when a.name is missing then
29                    'SEV-4: EXTRA COLUMN'
30                when b.type ne a.type then
31                    'SEV-1: MISMATCHED TYPE'
32                when b.length > a.length then
33                    'SEV-2: MISMATCHED LENGTH, POSSIBLE TRUNCATION'
34                when b.format ne a.format then
35                    'SEV-3: MISMATCHED FORMAT, POSSIBLE MISINTERPRETATION'
36                else
37                    'OK'
38              end as RULE
39       from
40         (select name, length, type, format from sashelp.vcolumn
41           where libname="WORK" and memname="_TMPREF") a
42       full join
43         (select name, length, type, format from sashelp.vcolumn
44             where libname="WORK" and memname="_TMPIN") b
45       on (upcase(a.name)=upcase(b.name))
46       order by RULE
47      ;
48
49      drop view work._tmpref;
50      drop view work._tmpref;
51   quit;
52   %mend check_reference_data ;
53
54   %check_reference_data
55   (in_table =sashelp.class(drop=age)      /* name of input table to compare */
56   ,ref_table=sashelp.class     /* name of reference table for comparison */
57   ,out_result=test1   /* new data set to hold result */
58   );
NOTE: SQL view WORK._TMPIN has been defined.
NOTE: SQL view WORK._TMPREF has been defined.
NOTE: Table WORK.TEST1 created, with 5 rows and 11 columns.

NOTE: View WORK._TMPREF has been dropped.
WARNING: File WORK._TMPREF.VIEW does not exist.
WARNING: View WORK._TMPREF has not been dropped.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.05 seconds
      cpu time            0.04 seconds


Ask a Question
Discussion stats
  • 15 replies
  • 208 views
  • 0 likes
  • 4 in conversation