BookmarkSubscribeRSS Feed
RogerJones
Calcite | Level 5

I have a very large dataset with over 1000 columns, with column names formatted like this:

WORLDDATA.table2_usa_2017_population
WORLDDATA.table2_japan_2017_gnp

I only need to keep a subset of these parameters for a select few countries. I specify the custom lists:

%let list1 = usa canada uk japan southafrica;
%let list2 = population crimerate gnp;

How do I do a double for loop like so:

  param_list = []
  for (i in list1) {
    for (j in list2) {
      param_name = WORLDDATA.table2_{list1[i]}_2017_{list2[j]}
      param_list.append(param_name)
    }
  }

such that I can use param_list in

data final_dataset;
  set WORLDDATA.table2;
  keep {param_list};
run;

Thank you!

3 REPLIES 3
Reeza
Super User

If you know the variables and the naming structure is pretty fixed you have a few options that I see:

 

1. Generate a dynamic list using a macro approach

2. Select the names using pattern matching from the list, assumed fixed pattern of table2_COUNTRY_YEAR_STATISTIC. Use SCAN() to extract those portion of the name. 

3. Reformat the data such that it's in a long/longer format and then filter/manipulate it. 

 

For the long run, I suspect #3 is better because the countries will be rows rather than in the variable name. 

 

Approach #2 - note the index (2/4) may be wrong, you should test this.

 

proc sql noprint;
select name into :name_list separated by " "
from sashelp.vcolumn where memname = 'WORLDDATA'
and lower(scan(name, 2, "_.")) in ('use', 'canada', 'japan')
and lower(scan(name, 4, "_.")) in ('population', 'crimerate', 'gap');
quit;

%put &name_list.;

data want;
set worldData;
keep &name_list;
run;

 

 

 

For an example of #3 I'd need more information on the sample structure. 


@RogerJones wrote:

I have a very large dataset with over 1000 columns, with column names formatted like this:

WORLDDATA.table2_usa_2017_population
WORLDDATA.table2_japan_2017_gnp

I only need to keep a subset of these parameters for a select few countries. I specify the custom lists:

%let list1 = usa canada uk japan southafrica;
%let list2 = population crimerate gnp;

How do I do a double for loop like so:

  param_list = []
  for (i in list1) {
    for (j in list2) {
      param_name = WORLDDATA.table2_{list1[i]}_2017_{list2[j]}
      param_list.append(param_name)
    }
  }

such that I can use param_list in

data final_dataset;
  set WORLDDATA.table2;
  keep {param_list};
run;

Thank you!


 

 

 

PGStats
Opal | Level 21

I like Approach #2 best, and would write it:

 

proc sql noprint;
select name into :name_list separated by " "
from sashelp.vcolumn 
where memname = 'WORLDDATA' and prxmatch("/_(usa|canada|uk|japan|southafrica)_2017_(population|crimerate|gnp)/i", name);
quit;

%put &name_list.;

data want;
set worldData(keep=&name_list.);
run;
PG
Astounding
PROC Star

I think you are seeking something more generic, along the lines of a macro:

%macro vlist (list1=, list2=) ;
%local j k ;
%do j=1 %to %sysfunc(countw(&list1) ) ;
%do k=1 %to %sysfunc(countw(&list2) ) ;
table2_%scan(&list1, &j)_2017_%scan(&list2, &k)
%end;
%end;

%mend vlist;

Then you can use the macro where you want the list of names such as:

drop %vlist (list1= usa canada uk japan southafrica,
list2 = population crime rate gnp) ;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 397 views
  • 0 likes
  • 4 in conversation