I have a very large dataset with over 1000 columns, with column names formatted like this:
WORLDDATA.table2_usa_2017_population WORLDDATA.table2_japan_2017_gnp
I only need to keep a subset of these parameters for a select few countries. I specify the custom lists:
%let list1 = usa canada uk japan southafrica; %let list2 = population crimerate gnp;
How do I do a double for loop like so:
param_list = []
for (i in list1) {
for (j in list2) {
param_name = WORLDDATA.table2_{list1[i]}_2017_{list2[j]}
param_list.append(param_name)
}
}such that I can use param_list in
data final_dataset;
set WORLDDATA.table2;
keep {param_list};
run;Thank you!
If you know the variables and the naming structure is pretty fixed you have a few options that I see:
1. Generate a dynamic list using a macro approach
2. Select the names using pattern matching from the list, assumed fixed pattern of table2_COUNTRY_YEAR_STATISTIC. Use SCAN() to extract those portion of the name.
3. Reformat the data such that it's in a long/longer format and then filter/manipulate it.
For the long run, I suspect #3 is better because the countries will be rows rather than in the variable name.
Approach #2 - note the index (2/4) may be wrong, you should test this.
proc sql noprint;
select name into :name_list separated by " "
from sashelp.vcolumn where memname = 'WORLDDATA'
and lower(scan(name, 2, "_.")) in ('use', 'canada', 'japan')
and lower(scan(name, 4, "_.")) in ('population', 'crimerate', 'gap');
quit;
%put &name_list.;
data want;
set worldData;
keep &name_list;
run;
For an example of #3 I'd need more information on the sample structure.
@RogerJones wrote:
I have a very large dataset with over 1000 columns, with column names formatted like this:
WORLDDATA.table2_usa_2017_population WORLDDATA.table2_japan_2017_gnpI only need to keep a subset of these parameters for a select few countries. I specify the custom lists:
%let list1 = usa canada uk japan southafrica; %let list2 = population crimerate gnp;How do I do a double for loop like so:
param_list = [] for (i in list1) { for (j in list2) { param_name = WORLDDATA.table2_{list1[i]}_2017_{list2[j]} param_list.append(param_name) } }such that I can use param_list in
data final_dataset; set WORLDDATA.table2; keep {param_list}; run;Thank you!
I like Approach #2 best, and would write it:
proc sql noprint;
select name into :name_list separated by " "
from sashelp.vcolumn
where memname = 'WORLDDATA' and prxmatch("/_(usa|canada|uk|japan|southafrica)_2017_(population|crimerate|gnp)/i", name);
quit;
%put &name_list.;
data want;
set worldData(keep=&name_list.);
run;
I think you are seeking something more generic, along the lines of a macro:
%macro vlist (list1=, list2=) ;
%local j k ;
%do j=1 %to %sysfunc(countw(&list1) ) ;
%do k=1 %to %sysfunc(countw(&list2) ) ;
table2_%scan(&list1, &j)_2017_%scan(&list2, &k)
%end;
%end;
%mend vlist;
Then you can use the macro where you want the list of names such as:
drop %vlist (list1= usa canada uk japan southafrica,
list2 = population crime rate gnp) ;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.