BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
belboy
Obsidian | Level 7

Hello,

I'm working to compare two Excel files after importing them both in SAS.  I need to track column name changes...if the order of the column names changes, I need to know. 

 

For instance:

 

The first Excel file has 50 columns and is a current 2018 version.  The column names are EmpID, Date, Gender, Pay, Dept., etc, etc.

 

The second Excel file has 50 columns, but is an archived 2017 version.  The column names are the same.

 

But if the order of the column names (variables) change in 2018, then I want to know.  So, if instead the 2018 Excel file column name changes to EmpID, Gender, Pay, Date, etc. etc (the "Date" column moves from 2nd position, to 4th position), which is different from 2017... I want SAS to let me know.

 

I used the proc compare, but proc compare doesn't track position changes with variable names.  I can't find a solution to this anywhere.

 

Help is much appreciated...and thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Here is one way to approach this with two very small data sets as a concrete example. We don't need lots of values for this exercise.

 

data work.junk1;
   input x y z;
datalines ;
1 2 3
;

data work.junk2;
   input x z $ y$ a;
datalines;
2 3 4 5
;
run;

proc sql;
   create table varcompare as
   select memname, name, varnum, type
   from dictionary.columns
   where libname='WORK' and memname in ('JUNK1' 'JUNK2')
   ;
run;

proc tabulate data=varcompare;
   class memname name varnum type;
   tables name*varnum ,
          memname=''*type=''*n=''
          /misstext=' '
          ;
run;

Note that I included the variable type, numeric or character, as well since different spreadsheets are notorious for resulting in different variable types for similar names. I also include a variable that only occurs in one set.

 

The proc tabulate output shows a 1 for the values observed. So in this output you can see that variable A only exists in JUNK2, the variables Y and Z appear in different columns and which columns are involved AND that Y and Z both changed variable types.

 

This would also allow incorporating more datasets in the analysis that are supposed to be similar by including the names in the where clause in the proc sql. The syntax for that where might cumbersome if you have to look at multiple libraries. The Libname and Memname values in the dictionary.columns are stored in all uppercase so that is what should be on the where clause.

View solution in original post

9 REPLIES 9
belboy
Obsidian | Level 7

I used this code as a test... but it failed.  Pls help.

 

data test;
set work.ACH;

check = varnum("work.ach","EmpID");
run;

 

I thought it would give me the result of check = 1 ; which is the position of EmpID.  What am I missing?

 

My log:

26:17
NOTE: Invalid numeric data, 'work.ach' , at line 26 column 17.
NOTE: Argument 1 to function VARNUM(.,'EmpID') at line 26 column 10 is invalid.

KachiM
Rhodochrosite | Level 12

Assume Two set of variables are compared. What is the output you want?. Show it by example.

belboy
Obsidian | Level 7

For example,

2018 file = EmpID, Gender, Pay, Dept., Date

 

2017 file = EmpID, Date, Gender, Pay, Dept., 

 

Output would something like:

 

Error - column 2 or "Date" is not in 2nd position in 2018.

 

or  

 

Error - column 2 is out of order from 2017 archive.

ballardw
Super User

Here is one way to approach this with two very small data sets as a concrete example. We don't need lots of values for this exercise.

 

data work.junk1;
   input x y z;
datalines ;
1 2 3
;

data work.junk2;
   input x z $ y$ a;
datalines;
2 3 4 5
;
run;

proc sql;
   create table varcompare as
   select memname, name, varnum, type
   from dictionary.columns
   where libname='WORK' and memname in ('JUNK1' 'JUNK2')
   ;
run;

proc tabulate data=varcompare;
   class memname name varnum type;
   tables name*varnum ,
          memname=''*type=''*n=''
          /misstext=' '
          ;
run;

Note that I included the variable type, numeric or character, as well since different spreadsheets are notorious for resulting in different variable types for similar names. I also include a variable that only occurs in one set.

 

The proc tabulate output shows a 1 for the values observed. So in this output you can see that variable A only exists in JUNK2, the variables Y and Z appear in different columns and which columns are involved AND that Y and Z both changed variable types.

 

This would also allow incorporating more datasets in the analysis that are supposed to be similar by including the names in the where clause in the proc sql. The syntax for that where might cumbersome if you have to look at multiple libraries. The Libname and Memname values in the dictionary.columns are stored in all uppercase so that is what should be on the where clause.

belboy
Obsidian | Level 7

thanks @ballardw,

this gives me a start - thanks for taking the time to provide a clear example.

ballardw
Super User

@belboy wrote:

thanks @ballardw,

this gives me a start - thanks for taking the time to provide a clear example.


I have a program that does something similar for 20 data sets with changing content (a government sponsored survey and so the content WILL change so don't blame me) where I used this approach to examine variable labels because we had "same label different variable) and "same variable different label". Also as a reference for which yearly set to look for related topics. Then I could be prepared for some of the things likely to watch for when combining data.

 

One of the nice things I like about SAS is, assuming matching variable name and types, combining data sets does not depend on the order of variables within sets.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2016 views
  • 2 likes
  • 4 in conversation