Is there a way to reference a list of variables based on a common suffix?
I simply want to reference a list in a KEEP statement without having to include every single variable. The variables are not continuous with each other in the data set. Below are some sample variables, but I have to do this for many many variables so a list would be preferred. Thanks.
Example:
DATA new_dataset;
SET old_dataset;
KEEP A09CGTDY B09CGTDY C09CGTDY D09CGTDY E09CGTDY F09CGTDY G09CGTDY H09CGTDY I09CGTDY;
RUN;
proc sql noprint;
select name into :colnames separated by ' '
from dictionary.columns
where upcase(substr(name,4)) eq 'CGTDY' /* Your suffix goes here */
and libname='WORK' /* Whatever libname is needed goes here, upper case */
and memname='OLD_DATASET' /* Your data set name goes here, upper case */
;
quit;
data new_dataset;
set old_dataset;
keep &colnames;
run;
Hi @_maldini_
If you want to get a list of variables with mixed suffix, you can do this in a single query. I'm surprised no one has suggested this already!
I'm borrowing and modifying the query @PaigeMiller had supplied in her first reply
/* Simulating a data set with many variables of various suffixes */
data have;
LENGTH var1 var2 var3 $5;
LENGTH _01CGTDY _02CGTDY _03CGTDY _04CGTDY _05CGTDY $1;
LENGTH _01smoke _02smoke _03smoke _04smoke _06smoke _06smoke _07smoke 3;
LENGTH _01ALTDY _02ALTDY _03ALTDY _04ALTDY $3;
run;
proc sql noprint;
select name into :colnames separated by ' '
from dictionary.columns
where libname='WORK' /* Whatever libname is needed goes here, upper case */
and memname='HAVE' /* Your data set name goes here, upper case */
and ((upcase(name) LIKE '%CGTDY') OR (upcase(name) LIKE '%SMOKE') OR (upcase(name) LIKE '%ALTDY')) ;
quit;
/* I think using the like function with different suffixes gives you what you are looking for */
%put &=colnames;
data want;
set have(KEEP=&colnames);
run;
Hope this helps,
Ahmed
If you are looking for a macro that does this for you, then you would be interested in this blog from @SASJedi Jedi SAS Tricks: Building a Name Suffix variable list
Going forward I would suggest some planning when it comes to variable names. If they have multiple relationships in the names, i.e. you have A01XXX , A02XXX, A03XXX (where XXX is just some other string of related text) then perhaps an ideal name structure to to use for all purposes but XXXA01 XXXA02 XXXA03 would let you use lists like XXXA: to get all the XXXA variables, or XXXA01-XXXA15 (assuming you have all those ending in A02 through A14 and more such as up to A100).
However if you have A01XXX and A01YYY and A01ZZZ that you also want to use lists like A: or A01: your going to have to deal with long lists. But this might be indicative that the data structure could be improved and perhaps the text of XXX, YYY and ZZZ should be the value of a variable.
Unfortunately I have no say in the naming of the variables. It's an analysis of previously collected data w/ 4962 variables.
NOTE: No rows were selected.
One possible cause is that the data set name cannot have a dot in it in the MEMNAME column; the libname and dot is not part of the data set name for the purposes of this SQL search.
Yep, that fixed it! Thanks.
What would be the best way to do this for multiple lists? Would I do a separate proc sql for each list?
Thanks again.
@_maldini_ wrote:
What would be the best way to do this for multiple lists? Would I do a separate proc sql for each list?
Explain this further. Give details. Give examples. Don't make us guess what you mean.
Apologies. I want to subset an enormous dataset w/ 4962 variables using a KEEP statement. The data are derived from a longitudinal study w/ multiple time points. Variables that begin with "A" represent time point 1. Variables that begin with "B" represent time point 2. Etc.
The list of variables I used in my original question was 1 variable measured at multiple time points (A09CGTDY to I09CGTDY). I want to include all time points for this variable in the subset.
I want to use the KEEP statement to reference multiple lists. This list is just for one variable. I want to do this for 25+ variables.
Would I create a separate proc sql for each variable w/ its individual suffix and character count?
Sorry for not being more clear in the beginning.
I want to use the KEEP statement to reference multiple lists. This list is just for one variable. I want to do this for 25+ variables.
You said very similar things in an earlier post, and I asked for examples. I don't see examples of what you mean.
Here is what my primitive SAS brain was thinking:
&colnames is a macro variable for the list of variables w/ a suffix of SMOKE.
&colnames2 is a macro variable for the list of variables w/ a suffix of TEST, etc.
I'm sure there are better ways of doing this, but I was asking about writing a separate proc sql for each list of variables (&colnames2=age, &colnames3=physical activity, &colnames4=alcohol use, etc.).
proc sql noprint;
select name into :colnames separated by ' '
from dictionary.columns
where substr(name,4) eq 'SMOKE' /* Your suffix goes here */
and libname='CARDIA' /* Whatever libname is needed goes here, upper case */
and memname='C1260REQ06_29_2023' /* Your data set name goes here, upper case */
;
quit;
proc sql noprint;
select name into :colnames2 separated by ' '
from dictionary.columns
where substr(name,4) eq 'TEST' /* Your suffix goes here */
and libname='CARDIA' /* Whatever libname is needed goes here, upper case */
and memname='C1260REQ06_29_2023' /* Your data set name goes here, upper case */
;
quit;
data new_dataset;
set CARDIA.C1260REQ06_29_2023;
keep &colnames &colnames2 ID;
run;
@_maldini_ wrote:
I'm sure there are better ways of doing this, but I was asking about writing a separate proc sql for each list of variables (&colnames2=age, &colnames3=physical activity, &colnames4=alcohol use, etc.).
proc sql noprint; select name into :colnames separated by ' ' from dictionary.columns where substr(name,4) eq 'SMOKE' /* Your suffix goes here */ and libname='CARDIA' /* Whatever libname is needed goes here, upper case */ and memname='C1260REQ06_29_2023' /* Your data set name goes here, upper case */ ; quit; proc sql noprint; select name into :colnames2 separated by ' ' from dictionary.columns where substr(name,4) eq 'TEST' /* Your suffix goes here */ and libname='CARDIA' /* Whatever libname is needed goes here, upper case */ and memname='C1260REQ06_29_2023' /* Your data set name goes here, upper case */ ; quit; data new_dataset; set CARDIA.C1260REQ06_29_2023; keep &colnames &colnames2 ID; run;
If you just want to generate one variable list, you wouldn't need to write multiple PROC SQL steps, you can user an OR operator on the WHERE clause, like:
proc sql noprint;
select name into :colnames separated by ' '
from dictionary.columns
where substr(name,4) eq 'SMOKE' or substr(name,4) eq 'ALCOHOL'
and libname='CARDIA' /* Whatever libname is needed goes here, upper case */
and memname='C1260REQ06_29_2023' /* Your data set name goes here, upper case */
;
quit;
data new_dataset;
set CARDIA.C1260REQ06_29_2023;
keep &colnames ID;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.