Hello:
I have a data set that has more than 300 variables. I would like to select some variables that have the same prefix and suffix. e.g I only want to keep variables with prefix "s" and suffix "2020". Please my code for short version of my data set. The code includes something I tried but not working.
Thanks,
data have;
input var1 var2 s12020 t12020 u12020 s22020 t22020 u22020 s32020 t32020 u32020;
cards;
1 2 3 4 5 6 7 8 9 1 2
;
run;
/* I would like to keep s12020 s22020 s32020, this is an example, I have longer var list than this in real data */
/*
data want;
set have (keep= s12020-s32020);
run;
ERROR: Not all variables in the list s12020-s32020 were found.
*/
Hello @sasecn,
After all these creative suggestions have been made, I'm hesitant to add a boring PROC SQL approach:
proc sql noprint;
select name into :sxx2020 separated by ' '
from dictionary.columns
where libname='WORK' & memname='HAVE' & name like 's%2020';
quit;
data want;
set have(keep=&sxx2020);
run;
Edit: Similarly, you can use SASHELP.VCOLUMN and combine the two steps above into one DATA _NULL_ step:
data _null_;
set sashelp.vcolumn end=last;
where libname='WORK' & memname='HAVE' & name like 's%2020';
if _n_=1 then call execute('data want; set have(keep=');
call execute(name);
if last then call execute('); run;');
run;
If I may recommend the BasePlus package (here is the doc: https://github.com/yabwon/SAS_PACKAGES/blob/master/packages/baseplus.md) and the %getVars() macro.
Bart
filename packages "%sysfunc(pathname(work))"; /* setup temporary directory for packages in the WORK */
filename SPFinit url "https://raw.githubusercontent.com/yabwon/SAS_PACKAGES/master/SPF/SPFinit.sas";
%include SPFinit; /* enable the framework */
%installPackage(BasePlus) /* install the package */
%loadPackage(BasePlus) /* load the package */
data have;
input var1 var2 s12020 t12020 u12020 s22020 t22020 u22020 s32020 t32020 u32020;
cards;
1 2 3 4 5 6 7 8 9 1 2
;
run;
data want;
set have (keep= %getVars(have, pattern=^s.*2020$, varRange=_numeric_));
put _ALL_;
run;
@sasecn wrote:
Hello:
I have a data set that has more than 300 variables. I would like to select some variables that have the same prefix and suffix. e.g I only want to keep variables with prefix "s" and suffix "2020".
If you have the ability to choose the names in your data set, you really ought to name the variable differently so you can choose the variables much more easily. Make the prefix s2020 instead of prefix s and suffix 2020. Then you can select them all using the colon after s2020, like this:
data want;
set have (keep= s2020:);
run;
@yabwon Sir Bart(the genie). If I knew you few years ago, I would have done an MBA and sought a management or political career of sorts and cleverly outsourced all my work to you. Kudos! Now I see the value of the packages and routines. Bro, that's a lot of effort. Jeez well done!
Anyways, I wanted to have some fun too-
data have;
input var1 var2 s12020 t12020 u12020 s22020 t22020 u22020 s32020 t32020 u32020;
cards;
1 2 3 4 5 6 7 8 9 1 2
8 7 6 5 4 3 2 1 0 9 7
;
run;
data _null_;
set have end=z;
if _n_=1 then do;
dcl hash h(multidata:'y',ordered:'y');
h.definekey('_n_');
array t _numeric_;
do over t;
if prxmatch('/^s.*2020$/',vname(t)) then h.definedata(vname(t));
end;
h.definedone();
end;
h.add();
if z;
h.output(dataset:'want');
run;
proc print noobs;run;
Thank you for kind words! 🙂 The biggest joy is when I see that SPF is working "in the battle field"
I've just had new version released: https://communities.sas.com/t5/SAS-Programming/SAS-Packages-Framework-version-20201101/m-p/695742 You can 👍 if you want 😉
Btw. your use of hash table is also very elegant.
All the best
Bart
There is no "if i want" for such "meritorious deeds". Done! My oh my, I can't believe I missed that post. Indeed deserves heaps of praise. Thank you for reminding me. Also, I am gonna take a week long off end of this month, and I would utilize that time to read and understand your SAS packages. Once again, very well done! Great stuff!
If you fancy there will be a BASUG seminar about SPF in November (thanks go to @Quentin).
Registration is open: https://us02web.zoom.us/webinar/register/WN_znATxbA4TvactunkegVFAA
Bart
One more for the fun record-
data _null_;
set have;
length n $32 var_list $32767;
do while(1);
call vnext(n);
if n=' ' then leave;
if not prxmatch('/^s.*2020$/',strip(n)) then continue;
var_list=catx(' ',var_list,n);
end;
call symputx('var_list',var_list,'g');
stop;
run;
%put &=var_list;
data want;
set have(keep=&var_list);
run;
Hello @sasecn,
After all these creative suggestions have been made, I'm hesitant to add a boring PROC SQL approach:
proc sql noprint;
select name into :sxx2020 separated by ' '
from dictionary.columns
where libname='WORK' & memname='HAVE' & name like 's%2020';
quit;
data want;
set have(keep=&sxx2020);
run;
Edit: Similarly, you can use SASHELP.VCOLUMN and combine the two steps above into one DATA _NULL_ step:
data _null_;
set sashelp.vcolumn end=last;
where libname='WORK' & memname='HAVE' & name like 's%2020';
if _n_=1 then call execute('data want; set have(keep=');
call execute(name);
if last then call execute('); run;');
run;
Sir @FreelanceReinh Immortals are so unassuming, while mere mortals like me can only worship "kolmogorov equals" and not become one. End of the Sunday story!
If "12020" in any way relates to January 2020, then you are 1) hiding data values in variable names and 2) likely have a suboptimal data structure and might be better off transposing the data and adding a date variable so you have one record per "month" with names like S T and U.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.