I have a large data frame already formatted the way I desire with several thousand columns called df1.
col1 col2 col3 .... col5000
1 1 1 .... 1
1 0 0 .... 1
1 1 0 .... 1
... .... ... .... ....
I've selected a few hundred of these columns to use as features and have the list stored in a separate dataframe, df2.
ColN
col1
col3
...
col5000
The list is approx. 1000 entries and are not sequential. How can I select the columns from df1 so that my final result is something like
col1 col3 .... col5000
1 1 .... 1
1 0 .... 1
1 0 .... 1
.... .... .... ....
Currently I've tried something like this
proc sql;
create table
df_subset as
select
A.* in(B.colN)
from
df1 as A,
df2 as B
quit;
I am currently working in SAS studio
proc sql noprint;
select distinct colN into :wanted_columns separated by ' ' from df2;
quit;
data df_subset;
set df1(keep=&wanted_columns);
run;
@rschubert1 Your code doesn't work as you try to use metadata where SQL expects data.
The method used by @PaigeMiller works.
Similarly:
proc sql noprint;
select distinct colN into :wanted_columns separated by ',' from DF2;
create table SUBSET as select &wanted_columns from DF1;
run;
Calling all data scientists and open-source enthusiasts! Want to solve real problems that impact your company or the world? Register to hack by August 31st!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.