BookmarkSubscribeRSS Feed
deleted_user
Not applicable
I have what's probably a simple Base SAS question: I have two datasets, with most of the same variables in both sets. (In other words, 90% of the variables in one set are in the other set.) I would like to keep only the variables shared by both datasets. How do I keep the mutual variables, and drop the others from both datasets?
4 REPLIES 4
Black
Calcite | Level 5
i think you can use the "proc contents" to get the variables name of the two datasets. Then chose the mutual variables, and keep them in the two datasets.
Not hard to implement by macro.
deleted_user
Not applicable
It might be a bit tricky.

You have say x+y columns in dataset A and x+z columns in dataset B. You want a new dataset C where you have x columns and the data of both A and B.

Had it been a row-problem instead of a column problem, it could have been done very easily. Actually that is what I am going to propose.

Step 1: Proc Transpose datasets A and B to say A_ and B_, now these have x+y rows (not columns) and x+z rows respectively and if i remember correctly the first variable would be the original column-names (referenced by _NAME_).

Step 2: Merge A_ and B_ By _NAME_ (assuming the varaible names in both A and B were same for common variables to start with) when (IN=A_) and (IN=B_), i.e the intersection to get final dataset C_.

Step 3: Proc transpose C_ to C so that you get the original format of datasets. You might have to do some data cleaning, formating and renaming of the variables. If you would want to have A and B separate, then you have to insert a new variable to identify from which dataset the column is originally, can do it before step 1 or in step 2 and segregate into A nd B depending on that variable now.
Olivier
Pyrite | Level 9
Hi all.
The solution "transpose-intensive" does not seem to be accurate if your data set is too big (= has too many observations). The first suggestion is more likely to work in any case.
The code would look like this...
[pre]
PROC CONTENTS DATA = dataset1 OUT = variables1 NOPRINT ;
RUN ;
PROC CONTENTS DATA = dataset2 OUT = variables2 NOPRINT ;
RUN ;
PROC SQL NOPRINT ;
SELECT first.name INTO : commonVariables SEPARATED BY " "
FROM variables1 AS first,
variables2 AS second
WHERE UPCASE(first.name)=UPCASE(second.name)
;
QUIT ;
/* check the variable names */
%PUT &commonVariables ;
DATA dataset1_common ;
SET dataset1 (KEEP = &commonVariables) ;
RUN ;
DATA dataset2_common ;
SET dataset2 (KEEP = &commonVariables) ;
RUN ;
[/pre]
Cheers.
Olivier
Cynthia_sas
SAS Super FREQ
Hi:
And to simplify the creation of the macro variable, you could use the DICTIONARY.COLUMNS file in the SQL query:
[pre]
proc sql;
select a.name
into :commonvariables separated by " "
from dictionary.columns as a,
dictionary.columns as b
where upcase(a.name)=upcase(b.name)
and (a.libname="WORK" and a.memname="DATASET1")
and (b.libname="WORK" and b.memname="DATASET2");
quit ;

%put commonvariables are = &commonvariables;

** rest of code to create new datasets with the commonvariable macro var.;
[/pre]

cynthia

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 858 views
  • 0 likes
  • 4 in conversation