Hello.
A dataset_1 has 100 variables
A dataset_2 has 1500 variables (50 are common to dataset_1).
I want to append the observations of dataset_2 to dataset_1, but choosing only the 50 common variables, without writing 50 KEEP's, one by one.
Is this possible using some kind of KEEP?
DATA dataset_1;
KEEP "50common_variables)
SET dataset_2;
RUN;
Thank you!
This is a case where one of the special features of PROC SQL, namely the ability to access metadata via the DICTIONARY container name, is very handy:
proc sql noprint;
select a.name into :keeplist separated by ' '
from
(select name from dictionary.columns where libname="WORK" and memname="DATASET_1") as a
inner join
(select name from dictionary.columns where libname="WORK" and memname="DATASET_2") as b
on a.name=b.name;
quit;
%put &=keeplist;
data want;
set d1 (keep=&keeplist)
d2 (keep=&keeplist) ;
run;
There are a LOT of dictionaries available via PROC SQL. Dictionary COLUMNS is the one suitable for your task. Do a "select * from dictionary.dictionaries" to see them all.
Now, if your answer to @Astounding's question is that you don't want the common variables, but rather all the variables from one of the data sets (i.e. exclude vars only in the 2nd dataset), you don't even need metadata access. Instead, you can just manipulate the program-data-vector constructed by the sas compiler in a data step:
data want (drop=_pre _post);
retain _pre .;
if 0 then set dataset_1;
retain _post .;
set dataset_1 dataset_2;
keep _pre -- _post ;
run;
Edit entered to initialize the variables (to missing values) in the two RETAIN statements.
This all depends on how the pdv (program data vector) is enlarged statement-by-statement step-by-step by the sas compiler:
Just to clarify ...
Should the final result include only the 50 common variables, or should it include all 100 variables from dataset 1?
This is a case where one of the special features of PROC SQL, namely the ability to access metadata via the DICTIONARY container name, is very handy:
proc sql noprint;
select a.name into :keeplist separated by ' '
from
(select name from dictionary.columns where libname="WORK" and memname="DATASET_1") as a
inner join
(select name from dictionary.columns where libname="WORK" and memname="DATASET_2") as b
on a.name=b.name;
quit;
%put &=keeplist;
data want;
set d1 (keep=&keeplist)
d2 (keep=&keeplist) ;
run;
There are a LOT of dictionaries available via PROC SQL. Dictionary COLUMNS is the one suitable for your task. Do a "select * from dictionary.dictionaries" to see them all.
Now, if your answer to @Astounding's question is that you don't want the common variables, but rather all the variables from one of the data sets (i.e. exclude vars only in the 2nd dataset), you don't even need metadata access. Instead, you can just manipulate the program-data-vector constructed by the sas compiler in a data step:
data want (drop=_pre _post);
retain _pre .;
if 0 then set dataset_1;
retain _post .;
set dataset_1 dataset_2;
keep _pre -- _post ;
run;
Edit entered to initialize the variables (to missing values) in the two RETAIN statements.
This all depends on how the pdv (program data vector) is enlarged statement-by-statement step-by-step by the sas compiler:
Proc append will append a data set and only have common variables in the result though you need to use FORCE option. So you would only need to Keep the values for dataset_1
Proc append base=dataset_1 (keep= list to keep) data=dataset_2 Force;
run;
HOWEVER as with a data step variables of the same name will have to have the same type and if character variables in dataset_2 are longer than in dataset_1 they may be truncated.
If the variables you want from dataset_1 have "nice" names like ABC1, ABC2 (a common prefix or start of the variable name) you can use : notation for a list, keep = ABC: would keep all variables that start with the letters ABC, or you could use a range list: Keep = ABC1-ABC9 would keep all the ABC variables with suffixes between 1 and 9. Or the -- (two dashes) gets "sequential" varibles
Keep a--x would keep the variables a to x in order (if a was the 10th variable column and x was the 15th variable column you would get 10th through 15th variables. This could be modified to request either the numeric or character variables in those columns: a-numeric-x or a-character-x
or you may be able to use _numeric_ or _character_ if you some of what you want is ALL numeric or ALL character variables.
It is easy for SQL. data class1; set sashelp.class; keep name sex age; run; data class2; set sashelp.class; keep sex age weight; run; proc sql; create table want as select * from class1 union corresponding all select * from class2; quit;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.