DATA Step, Macro, Functions and more

Keeping Mutual Variables In Separate Datasets

Reply
N/A
Posts: 0

Keeping Mutual Variables In Separate Datasets

I have what's probably a simple Base SAS question: I have two datasets, with most of the same variables in both sets. (In other words, 90% of the variables in one set are in the other set.) I would like to keep only the variables shared by both datasets. How do I keep the mutual variables, and drop the others from both datasets?
Occasional Contributor
Posts: 13

Re: Keeping Mutual Variables In Separate Datasets

i think you can use the "proc contents" to get the variables name of the two datasets. Then chose the mutual variables, and keep them in the two datasets.
Not hard to implement by macro.
N/A
Posts: 0

Re: Keeping Mutual Variables In Separate Datasets

It might be a bit tricky.

You have say x+y columns in dataset A and x+z columns in dataset B. You want a new dataset C where you have x columns and the data of both A and B.

Had it been a row-problem instead of a column problem, it could have been done very easily. Actually that is what I am going to propose.

Step 1: Proc Transpose datasets A and B to say A_ and B_, now these have x+y rows (not columns) and x+z rows respectively and if i remember correctly the first variable would be the original column-names (referenced by _NAME_).

Step 2: Merge A_ and B_ By _NAME_ (assuming the varaible names in both A and B were same for common variables to start with) when (IN=A_) and (IN=B_), i.e the intersection to get final dataset C_.

Step 3: Proc transpose C_ to C so that you get the original format of datasets. You might have to do some data cleaning, formating and renaming of the variables. If you would want to have A and B separate, then you have to insert a new variable to identify from which dataset the column is originally, can do it before step 1 or in step 2 and segregate into A nd B depending on that variable now.
Super Contributor
Posts: 260

Re: Keeping Mutual Variables In Separate Datasets

Hi all.
The solution "transpose-intensive" does not seem to be accurate if your data set is too big (= has too many observations). The first suggestion is more likely to work in any case.
The code would look like this...
[pre]
PROC CONTENTS DATA = dataset1 OUT = variables1 NOPRINT ;
RUN ;
PROC CONTENTS DATA = dataset2 OUT = variables2 NOPRINT ;
RUN ;
PROC SQL NOPRINT ;
SELECT first.name INTO : commonVariables SEPARATED BY " "
FROM variables1 AS first,
variables2 AS second
WHERE UPCASE(first.name)=UPCASE(second.name)
;
QUIT ;
/* check the variable names */
%PUT &commonVariables ;
DATA dataset1_common ;
SET dataset1 (KEEP = &commonVariables) ;
RUN ;
DATA dataset2_common ;
SET dataset2 (KEEP = &commonVariables) ;
RUN ;
[/pre]
Cheers.
Olivier
SAS Super FREQ
Posts: 8,742

Re: Keeping Mutual Variables In Separate Datasets

Hi:
And to simplify the creation of the macro variable, you could use the DICTIONARY.COLUMNS file in the SQL query:
[pre]
proc sql;
select a.name
into :commonvariables separated by " "
from dictionary.columns as a,
dictionary.columns as b
where upcase(a.name)=upcase(b.name)
and (a.libname="WORK" and a.memname="DATASET1")
and (b.libname="WORK" and b.memname="DATASET2");
quit ;

%put commonvariables are = &commonvariables;

** rest of code to create new datasets with the commonvariable macro var.;
[/pre]

cynthia
Ask a Question
Discussion stats
  • 4 replies
  • 151 views
  • 0 likes
  • 4 in conversation