Dear all,
I'm trying to subset from a dataset all columns (variables) containing a pattern (named "Q1"). I tried the following:
data myout;
set mydata;
keep Q1;
run;
but it keeps also Q11,Q12, ...
Can anyone help me please?
But Q11 and Q12 contain Q1: Q11 and Q12
Does your pattern is xxQ1xx wher "x"es mean letters or _?
So you need "abcQ1efg" but not "123Q1456"?
Bart
If it's like I wrote, you may try the basePlus' package macro %getVars():
data test;
q1=1;
abcQ1efg=2;
_123Q1456=3;
q11=4;
Q12=5;
xQ1=6;
run;
%loadPackage(basePlus)
/*
%helpPackage(baseplus, '%getVars()' )
*/
%put #%getVars(test,pattern=Q1(?!\d))#;
data want;
set test;
keep %getVars(test,pattern=Q1(?!\d));
run;
Bart
You can use dictionary tables and create a macro variable of the list of desired variable names.
proc sql noprint;
select name into :varlist separated by ' '
from dictionary.columns
where libname='WORK' and memname='MYDATA' /* must be uppercase */
and upcase(name) like "Q%";
quit;
%put &varlist;
data myout;
set mydata;
keep &varlist;
run;
proc print data=myout;
run;
@NewUsrStat wrote:
Hi Bart, thank you for your support. All columns (variables) start with Q*. No Q*in the middle or at the end of a name.
Examples will help make your question clearer.
If the variables are all defined with simple numeric suffixes perhaps just using an iterative variable list will work?
keep q1-q9 ;
would keep variables Q1, Q2, .... Q9 but not keep variable Q12 or Q21 for example.
OK, so if it suppose to be "Q1xyz...", where x is not a digit (to keep "Q1A", but not "Q17") the pattern should be:
data test;
q1=1;
abcQ1efg=2;
_123Q1456=3;
q11=4;
Q12=5;
xQ1=6;
q1abc=7;
q1_efg=8;
run;
%loadPackage(basePlus)
/*
%helpPackage(baseplus, '%getVars()' )
*/
%put #%getVars(test,pattern=^Q1(?!\d))#;
data want;
set test;
keep %getVars(test,pattern=^Q1(?!\d));
run;
It will select only: q1, q1abc, and q1_efg.
Bart
The code you posted would only keep one variable. Perhaps you instead used this keep statement?
keep q1: ;
That will keep any variable whose name starts with the letter Q followed by the digit 1.
Can you provide a simple example that demonstrates the issue? And expound on any variations that you need to support.
For your simple example you could use a combination of KEEP and DROP.
data test;
length q1 q11 q12 q1test x y z 8;
keep q1: ;
drop q11: q12: ;
run;
NOTE: The data set WORK.TEST has 1 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds 16 17 data _null_; 18 set test; 19 put (_all_) (=/); 20 run; q1=. q1test=. NOTE: There were 1 observations read from the data set WORK.TEST.
But I suspect your issue is more complex than that?
But Q11 does contain Q1, so what's the issue?
@NewUsrStat wrote:
Dear all,
I'm trying to subset from a dataset all columns (variables) containing a pattern (named "Q1"). I tried the following:
data myout; set mydata; keep Q1; run;
but it keeps also Q11,Q12, ...
Can anyone help me please?
No, the keep statement selects only Q1. Nothing else.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.