BookmarkSubscribeRSS Feed
_maldini_
Barite | Level 11

Is there a way to reference a list of variables based on a common suffix?

 

I simply want to reference a list in a KEEP statement without having to include every single variable. The variables are not continuous with each other in the data set. Below are some sample variables, but I have to do this for many many variables so a list would be preferred. Thanks.

 

Example: 

A09CGTDY 
B09CGTDY 
C09CGTDY 
D09CGTDY 
E09CGTDY
F09CGTDY 
G09CGTDY 
H09CGTDY 
I09CGTDY 
DATA new_dataset;
   SET old_dataset;
   KEEP A09CGTDY B09CGTDY C09CGTDY D09CGTDY E09CGTDY F09CGTDY G09CGTDY H09CGTDY I09CGTDY;
RUN;
26 REPLIES 26
PaigeMiller
Diamond | Level 26
proc sql noprint;
    select name into :colnames separated by ' ' 
    from dictionary.columns
    where upcase(substr(name,4)) eq 'CGTDY' /* Your suffix goes here */
        and libname='WORK' /* Whatever libname is needed goes here, upper case */
        and memname='OLD_DATASET' /* Your data set name goes here, upper case */
;
quit;

data new_dataset;
    set old_dataset;
    keep &colnames;
run;
    
--
Paige Miller
_maldini_
Barite | Level 11
Got 2 errors (switched the variable list on you...A02SMOKE, B02SMOKE, C02SMOKE).
 
Any advice?
 
See log below: 
 
 
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 proc sql noprint;
74 select name into :colnames separated by ' '
75 from dictionary.columns
76 where substr(name,4) eq 'SMOKE' /* Your suffix goes here */
77 and libname='CARDIA' /* Whatever libname is needed goes here, upper case */
78 and memname='CARDIA.C1260REQ06_29_2023' /* Your data set name goes here, upper case */
79 ;
NOTE: No rows were selected.
80 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 5342.34k
OS Memory 35752.00k
Timestamp 07/27/2023 05:20:00 PM
Step Count 74 Switch Count 0
Page Faults 0
Page Reclaims 56
Page Swaps 0
Voluntary Context Switches 5
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0
 
 
81
82 data new_dataset;
83 set CARDIA.C1260REQ06_29_2023;
NOTE: Data file CARDIA.C1260REQ06_29_2023.DATA is in a format that is native to another host, or the file encoding does not match
the session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might
reduce performance.
84 keep &colnames ID;
_
22
200
WARNING: Apparent symbolic reference COLNAMES not resolved.
ERROR 22-322: Syntax error, expecting one of the following: a name, ;, _ALL_, _CHARACTER_, _CHAR_, _NUMERIC_.
 
ERROR 200-322: The symbol is not recognized and will be ignored.
 
85 run;
 
WARNING: The variable colnames in the DROP, KEEP, or RENAME list has never been referenced.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.NEW_DATASET may be incomplete. When this step was stopped there were 0 observations and 1 variables.
WARNING: Data set WORK.NEW_DATASET was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.06 seconds
user cpu time 0.06 seconds
system cpu time 0.01 seconds
memory 7987.37k
OS Memory 37984.00k
Timestamp 07/27/2023 05:20:00 PM
Step Count 75 Switch Count 0
Page Faults 0
Page Reclaims 1950
Page Swaps 0
Voluntary Context Switches 5
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 8
 
 
86
87 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
99
AhmedAl_Attar
Rhodochrosite | Level 12

Hi @_maldini_ 

If you want to get a list of variables with mixed suffix, you can do this in a single query. I'm surprised no one has suggested this already!

I'm borrowing and modifying the query @PaigeMiller had supplied in her first reply

/* Simulating a data set with many variables of various suffixes */
data have;
	LENGTH var1 var2 var3 $5;
	LENGTH _01CGTDY _02CGTDY _03CGTDY _04CGTDY _05CGTDY $1;
	LENGTH _01smoke _02smoke _03smoke _04smoke _06smoke _06smoke _07smoke  3;
	LENGTH _01ALTDY _02ALTDY _03ALTDY _04ALTDY $3;
run;

proc sql noprint;
	select name into :colnames separated by ' ' 
	from dictionary.columns
	where libname='WORK' /* Whatever libname is needed goes here, upper case */
	and memname='HAVE'   /* Your data set name goes here, upper case */
	and ((upcase(name) LIKE '%CGTDY') OR  (upcase(name) LIKE '%SMOKE') OR (upcase(name) LIKE '%ALTDY')) ;
quit; /* I think using the like function with different suffixes gives you what you are looking for */
%put &=colnames; data want; set have(KEEP=&colnames); run;

Hope this helps,

Ahmed

AhmedAl_Attar
Rhodochrosite | Level 12

@_maldini_ 

If you are looking for a macro that does this for you, then you would be interested in this blog from @SASJedi Jedi SAS Tricks: Building a Name Suffix variable list  

 

ballardw
Super User

Going forward I would suggest some planning when it comes to variable names. If they have multiple relationships in the names, i.e. you have A01XXX , A02XXX, A03XXX (where XXX is just some other string of related text) then perhaps an ideal name structure to to use for all purposes but XXXA01 XXXA02 XXXA03 would let you use lists like XXXA: to get all the XXXA variables, or XXXA01-XXXA15 (assuming you have all those ending in A02 through A14 and more such as up to A100).

 

However if you have A01XXX and A01YYY and A01ZZZ that you also want to use lists like A: or A01: your going to have to deal with long lists. But this might be indicative that the data structure could be improved and perhaps the text of XXX, YYY and ZZZ should be the value of a variable.

_maldini_
Barite | Level 11

Unfortunately I have no say in the naming of the variables. It's an analysis of previously collected data w/ 4962 variables.

PaigeMiller
Diamond | Level 26
NOTE: No rows were selected.

One possible cause is that the data set name cannot have a dot in it in the MEMNAME column; the libname and dot is not part of the data set name for the purposes of this SQL search.

--
Paige Miller
_maldini_
Barite | Level 11

Yep, that fixed it! Thanks.

 

What would be the best way to do this for multiple lists? Would I do a separate proc sql for each list?

 

Thanks again.

PaigeMiller
Diamond | Level 26

@_maldini_ wrote:

 

What would be the best way to do this for multiple lists? Would I do a separate proc sql for each list?


Explain this further. Give details. Give examples. Don't make us guess what you mean.

--
Paige Miller
_maldini_
Barite | Level 11

Apologies. I want to subset an enormous dataset w/ 4962 variables using a KEEP statement. The data are derived from a longitudinal study w/ multiple time points. Variables that begin with "A" represent time point 1. Variables that begin with "B" represent time point 2. Etc.

 

The list of variables I used in my original question was 1 variable measured at multiple time points (A09CGTDY to I09CGTDY). I want to include all time points for this variable in the subset.

 

I want to use the KEEP statement to reference multiple lists. This list is just for one variable. I want to do this for 25+ variables.

 

Would I create a separate proc sql for each variable w/ its individual suffix and character count? 

 

Sorry for not being more clear in the beginning. 

PaigeMiller
Diamond | Level 26

I want to use the KEEP statement to reference multiple lists. This list is just for one variable. I want to do this for 25+ variables.

 

You said very similar things in an earlier post, and I asked for examples. I don't see examples of what you mean.

--
Paige Miller
_maldini_
Barite | Level 11

Here is what my primitive SAS brain was thinking: 

&colnames is a macro variable for the list of variables w/ a suffix of SMOKE. 

&colnames2 is a macro variable for the list of variables w/ a suffix of TEST, etc.

 

I'm sure there are better ways of doing this, but I was asking about writing a separate proc sql for each list of variables (&colnames2=age, &colnames3=physical activity, &colnames4=alcohol use, etc.).

proc sql noprint;
    select name into :colnames separated by ' ' 
    from dictionary.columns
    where substr(name,4) eq 'SMOKE' /* Your suffix goes here */
        and libname='CARDIA' /* Whatever libname is needed goes here, upper case */
        and memname='C1260REQ06_29_2023' /* Your data set name goes here, upper case */
;
quit;


proc sql noprint;
    select name into :colnames2 separated by ' ' 
    from dictionary.columns
    where substr(name,4) eq 'TEST' /* Your suffix goes here */
        and libname='CARDIA' /* Whatever libname is needed goes here, upper case */
        and memname='C1260REQ06_29_2023' /* Your data set name goes here, upper case */
;
quit;

data new_dataset;
    set CARDIA.C1260REQ06_29_2023;
    keep &colnames &colnames2 ID;
run;

 

 

Reeza
Super User
This method will work, just make sure that the substr() that limits to the characters after 4 is consistent. There's probably more efficient methods, but if this is one you understand and can edit/update yourself it's the easiest IMO.
Quentin
Super User

@_maldini_ wrote:

 

I'm sure there are better ways of doing this, but I was asking about writing a separate proc sql for each list of variables (&colnames2=age, &colnames3=physical activity, &colnames4=alcohol use, etc.).

proc sql noprint;
    select name into :colnames separated by ' ' 
    from dictionary.columns
    where substr(name,4) eq 'SMOKE' /* Your suffix goes here */
        and libname='CARDIA' /* Whatever libname is needed goes here, upper case */
        and memname='C1260REQ06_29_2023' /* Your data set name goes here, upper case */
;
quit;


proc sql noprint;
    select name into :colnames2 separated by ' ' 
    from dictionary.columns
    where substr(name,4) eq 'TEST' /* Your suffix goes here */
        and libname='CARDIA' /* Whatever libname is needed goes here, upper case */
        and memname='C1260REQ06_29_2023' /* Your data set name goes here, upper case */
;
quit;

data new_dataset;
    set CARDIA.C1260REQ06_29_2023;
    keep &colnames &colnames2 ID;
run;

If you just want to generate one variable list, you wouldn't need to write multiple PROC SQL steps, you can user an OR operator on the WHERE clause, like:

 

proc sql noprint;
    select name into :colnames separated by ' ' 
    from dictionary.columns
    where substr(name,4) eq 'SMOKE' or substr(name,4) eq 'ALCOHOL'
        and libname='CARDIA' /* Whatever libname is needed goes here, upper case */
        and memname='C1260REQ06_29_2023' /* Your data set name goes here, upper case */
;
quit;

data new_dataset;
    set CARDIA.C1260REQ06_29_2023;
    keep &colnames ID;
run;
BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 26 replies
  • 2773 views
  • 24 likes
  • 9 in conversation