I have a large data frame already formatted the way I desire with several thousand columns called df1.
col1 col2 col3 .... col5000
1 1 1 .... 1
1 0 0 .... 1
1 1 0 .... 1
... .... ... .... ....
I've selected a few hundred of these columns to use as features and have the list stored in a separate dataframe, df2.
ColN
col1
col3
...
col5000
The list is approx. 1000 entries and are not sequential. How can I select the columns from df1 so that my final result is something like
col1 col3 .... col5000
1 1 .... 1
1 0 .... 1
1 0 .... 1
.... .... .... ....
Currently I've tried something like this
proc sql;
create table
df_subset as
select
A.* in(B.colN)
from
df1 as A,
df2 as B
quit;
I am currently working in SAS studio
proc sql noprint;
select distinct colN into :wanted_columns separated by ' ' from df2;
quit;
data df_subset;
set df1(keep=&wanted_columns);
run;
@rschubert1 Your code doesn't work as you try to use metadata where SQL expects data.
The method used by @PaigeMiller works.
Similarly:
proc sql noprint;
select distinct colN into :wanted_columns separated by ',' from DF2;
create table SUBSET as select &wanted_columns from DF1;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.