DATA Step, Macro, Functions and more

Compare the order of variables (not their values) in different datasets

Accepted Solution Solved
Reply
Contributor
Posts: 40
Accepted Solution

Compare the order of variables (not their values) in different datasets

Hello,

I'm working to compare two Excel files after importing them both in SAS.  I need to track column name changes...if the order of the column names changes, I need to know. 

 

For instance:

 

The first Excel file has 50 columns and is a current 2018 version.  The column names are EmpID, Date, Gender, Pay, Dept., etc, etc.

 

The second Excel file has 50 columns, but is an archived 2017 version.  The column names are the same.

 

But if the order of the column names (variables) change in 2018, then I want to know.  So, if instead the 2018 Excel file column name changes to EmpID, Gender, Pay, Date, etc. etc (the "Date" column moves from 2nd position, to 4th position), which is different from 2017... I want SAS to let me know.

 

I used the proc compare, but proc compare doesn't track position changes with variable names.  I can't find a solution to this anywhere.

 

Help is much appreciated...and thanks.


Accepted Solutions
Solution
‎03-16-2018 12:11 PM
Super User
Posts: 13,941

Re: Compare the order of variables (not their values) in different datasets

Here is one way to approach this with two very small data sets as a concrete example. We don't need lots of values for this exercise.

 

data work.junk1;
   input x y z;
datalines ;
1 2 3
;

data work.junk2;
   input x z $ y$ a;
datalines;
2 3 4 5
;
run;

proc sql;
   create table varcompare as
   select memname, name, varnum, type
   from dictionary.columns
   where libname='WORK' and memname in ('JUNK1' 'JUNK2')
   ;
run;

proc tabulate data=varcompare;
   class memname name varnum type;
   tables name*varnum ,
          memname=''*type=''*n=''
          /misstext=' '
          ;
run;

Note that I included the variable type, numeric or character, as well since different spreadsheets are notorious for resulting in different variable types for similar names. I also include a variable that only occurs in one set.

 

The proc tabulate output shows a 1 for the values observed. So in this output you can see that variable A only exists in JUNK2, the variables Y and Z appear in different columns and which columns are involved AND that Y and Z both changed variable types.

 

This would also allow incorporating more datasets in the analysis that are supposed to be similar by including the names in the where clause in the proc sql. The syntax for that where might cumbersome if you have to look at multiple libraries. The Libname and Memname values in the dictionary.columns are stored in all uppercase so that is what should be on the where clause.

View solution in original post


All Replies
Super User
Posts: 10,570

Re: Compare the order of variables (not their values) in different datasets

Look at varnum in dictionary.columns (proc sql).

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 40

Re: Compare the order of variables (not their values) in different datasets

Posted in reply to KurtBremser

I used this code as a test... but it failed.  Pls help.

 

data test;
set work.ACH;

check = varnum("work.ach","EmpID");
run;

 

I thought it would give me the result of check = 1 ; which is the position of EmpID.  What am I missing?

 

My log:

26:17
NOTE: Invalid numeric data, 'work.ach' , at line 26 column 17.
NOTE: Argument 1 to function VARNUM(.,'EmpID') at line 26 column 10 is invalid.

Super User
Posts: 10,570

Re: Compare the order of variables (not their values) in different datasets

Maxim 1: Read the documentation.

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000148439.htm contains a fine example for the usage of varnum().

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 40

Re: Compare the order of variables (not their values) in different datasets

Posted in reply to KurtBremser
great.
Super Contributor
Posts: 332

Re: Compare the order of variables (not their values) in different datasets

Assume Two set of variables are compared. What is the output you want?. Show it by example.

Contributor
Posts: 40

Re: Compare the order of variables (not their values) in different datasets

For example,

2018 file = EmpID, Gender, Pay, Dept., Date

 

2017 file = EmpID, Date, Gender, Pay, Dept., 

 

Output would something like:

 

Error - column 2 or "Date" is not in 2nd position in 2018.

 

or  

 

Error - column 2 is out of order from 2017 archive.

Solution
‎03-16-2018 12:11 PM
Super User
Posts: 13,941

Re: Compare the order of variables (not their values) in different datasets

Here is one way to approach this with two very small data sets as a concrete example. We don't need lots of values for this exercise.

 

data work.junk1;
   input x y z;
datalines ;
1 2 3
;

data work.junk2;
   input x z $ y$ a;
datalines;
2 3 4 5
;
run;

proc sql;
   create table varcompare as
   select memname, name, varnum, type
   from dictionary.columns
   where libname='WORK' and memname in ('JUNK1' 'JUNK2')
   ;
run;

proc tabulate data=varcompare;
   class memname name varnum type;
   tables name*varnum ,
          memname=''*type=''*n=''
          /misstext=' '
          ;
run;

Note that I included the variable type, numeric or character, as well since different spreadsheets are notorious for resulting in different variable types for similar names. I also include a variable that only occurs in one set.

 

The proc tabulate output shows a 1 for the values observed. So in this output you can see that variable A only exists in JUNK2, the variables Y and Z appear in different columns and which columns are involved AND that Y and Z both changed variable types.

 

This would also allow incorporating more datasets in the analysis that are supposed to be similar by including the names in the where clause in the proc sql. The syntax for that where might cumbersome if you have to look at multiple libraries. The Libname and Memname values in the dictionary.columns are stored in all uppercase so that is what should be on the where clause.

Contributor
Posts: 40

Re: Compare the order of variables (not their values) in different datasets

thanks @ballardw,

this gives me a start - thanks for taking the time to provide a clear example.

Super User
Posts: 13,941

Re: Compare the order of variables (not their values) in different datasets


@belboy wrote:

thanks @ballardw,

this gives me a start - thanks for taking the time to provide a clear example.


I have a program that does something similar for 20 data sets with changing content (a government sponsored survey and so the content WILL change so don't blame me) where I used this approach to examine variable labels because we had "same label different variable) and "same variable different label". Also as a reference for which yearly set to look for related topics. Then I could be prepared for some of the things likely to watch for when combining data.

 

One of the nice things I like about SAS is, assuming matching variable name and types, combining data sets does not depend on the order of variables within sets.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 156 views
  • 2 likes
  • 4 in conversation