DATA Step, Macro, Functions and more

Comparing and listing varaibles from two seperate datsets

Accepted Solution Solved
Reply
Contributor
Posts: 24
Accepted Solution

Comparing and listing varaibles from two seperate datsets

Hi,

 

I have similar datasets.

 

Dataset A has 376 variables and

Dataset B has 428 variables.

 

How do I get a list of the variables which are not in both datasets?

 

Thanks


Accepted Solutions
Solution
‎12-12-2017 11:28 AM
Super User
Posts: 13,066

Re: Comparing and listing varaibles from two seperate datsets

Posted in reply to Adnan_Razaq

Proc Compare is one tool that provides that information (as well as if the same-named variables are of different types).

 

Proc compare base=datasetA compare=datasetb novalues listvar;

run;

 

View solution in original post


All Replies
Super User
Posts: 22,857

Re: Comparing and listing varaibles from two seperate datsets

[ Edited ]
Posted in reply to Adnan_Razaq

1. Get a list of variables from each data set

Use either SASHELP.VCOLUMN or PROC CONTENTS

 

2A. Use SQL to compare them

 

2B. Use PROC COMPARE to compare the lists

 

2C. Use a data step to do the comparison.

 

I have an example of the 2C approach here - note this is a fully worked example so the first few steps are generating sample data. Please make sure to read the comments.

 

*Here's a bit of a long way to generate the list of datasets with variables present.
This program looks at ALL variables, to restrict it to a select list, see the line of 
code to be modified via the comments.;


    /*Generate fake data to work with*/
	data class1;
		set sashelp.class;
		drop age sex;
	run;
	
	data class2;
		set sashelp.class;
		drop weight height;
	run;
	
	data class3;
		set sashelp.class;
		Order=1;
	run;
	
	data class4;
		set sashelp.class;
		keep name;
	run;
	
	data class5;
		set sashelp.class;
	run;
	
    /*Extract metadata from dictionary tables*/
	proc sql noprint;
		create table column_list as select memname, libname, name, type, 1 as count 
			from dictionary.columns where libname='WORK' and memname like 'CLASS%'
			/*ADD THE VARIABLE LIST HERE*/
			order by memname, name;
	quit;
	
    /*Transpose results to a more user friendly format*/
	proc transpose data=column_list out=flipped;
		by memname;
		id name;
		idlabel name;
		var count;
	run;
	
    /*Format output*/
	data want;
		retain memname;
		set flipped;
		array class(*) _NUMERIC_;
	
		do i=1 to dim(class);
	
			if class(i)=. then
				class(i)=0;
		end;
		All_Variables=sum(of _numeric_)-I;
		DROP I _NAME_;
	run;

https://gist.github.com/statgeek/3b57ae085d9f7a36a2d95c15f04e72e6

Super User
Posts: 22,857

Re: Comparing and listing varaibles from two seperate datsets

And a slightly different implementation:

https://gist.github.com/statgeek/e0b98c4627aa31a567e5

Solution
‎12-12-2017 11:28 AM
Super User
Posts: 13,066

Re: Comparing and listing varaibles from two seperate datsets

Posted in reply to Adnan_Razaq

Proc Compare is one tool that provides that information (as well as if the same-named variables are of different types).

 

Proc compare base=datasetA compare=datasetb novalues listvar;

run;

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 111 views
  • 2 likes
  • 3 in conversation