I have what's probably a simple Base SAS question: I have two datasets, with most of the same variables in both sets. (In other words, 90% of the variables in one set are in the other set.) I would like to keep only the variables shared by both datasets. How do I keep the mutual variables, and drop the others from both datasets?
You have say x+y columns in dataset A and x+z columns in dataset B. You want a new dataset C where you have x columns and the data of both A and B.
Had it been a row-problem instead of a column problem, it could have been done very easily. Actually that is what I am going to propose.
Step 1: Proc Transpose datasets A and B to say A_ and B_, now these have x+y rows (not columns) and x+z rows respectively and if i remember correctly the first variable would be the original column-names (referenced by _NAME_).
Step 2: Merge A_ and B_ By _NAME_ (assuming the varaible names in both A and B were same for common variables to start with) when (IN=A_) and (IN=B_), i.e the intersection to get final dataset C_.
Step 3: Proc transpose C_ to C so that you get the original format of datasets. You might have to do some data cleaning, formating and renaming of the variables. If you would want to have A and B separate, then you have to insert a new variable to identify from which dataset the column is originally, can do it before step 1 or in step 2 and segregate into A nd B depending on that variable now.
The solution "transpose-intensive" does not seem to be accurate if your data set is too big (= has too many observations). The first suggestion is more likely to work in any case.
The code would look like this...
PROC CONTENTS DATA = dataset1 OUT = variables1 NOPRINT ;
PROC CONTENTS DATA = dataset2 OUT = variables2 NOPRINT ;
PROC SQL NOPRINT ;
SELECT first.name INTO : commonVariables SEPARATED BY " "
FROM variables1 AS first,
variables2 AS second
/* check the variable names */
%PUT &commonVariables ;
DATA dataset1_common ;
SET dataset1 (KEEP = &commonVariables) ;
DATA dataset2_common ;
SET dataset2 (KEEP = &commonVariables) ;
And to simplify the creation of the macro variable, you could use the DICTIONARY.COLUMNS file in the SQL query:
into :commonvariables separated by " "
from dictionary.columns as a,
dictionary.columns as b
and (a.libname="WORK" and a.memname="DATASET1")
and (b.libname="WORK" and b.memname="DATASET2");
%put commonvariables are = &commonvariables;
** rest of code to create new datasets with the commonvariable macro var.;