DATA Step, Macro, Functions and more

Creating new variables based on part of a variable name

Accepted Solution Solved
Reply
Contributor
Posts: 29
Accepted Solution

Creating new variables based on part of a variable name

Hi all,

 

I want to search all of the variables in my dataset for a particular substring at the end of my variable name, and then make calculations to these variables that satisfy this condition and then create a new variable. In my dataset, I have a few hundred variables overall, and many that end in _population. For example, if I have vars di_population and xi_population and ci_population, I want each to sum each of these by the amount of observations I have in my dataset. 

I have been trying to use:

data new;
set old;
if findc('_population', _all_) then sum;
run;


but this doesn't seem to the correct way to find the variables that end in _population OR to add these together. Does anyone have any ideas?

 

Thanks.

 


Accepted Solutions
Solution
‎05-26-2016 07:30 PM
Super User
Posts: 9,681

Re: Creating new variables based on part of a variable name

data region1;
input name ob_population ob_est x_population x_est c_population c_est;
cards;
1          50       2           70       5         80       7
2         100       3           150      4         100      10
3          55       5           50       10        75       50
;
run;
proc sql noprint;
select name into : var_list separated by ", "
from sashelp.vcolumn
where libname='WORK' and memname='HAVE' and
 lowcase(name) like '%~_population' escape '~';
 
select catx(' ','sum(',name,') as ',cats(name,'_sum')) into : var_list1 separated by ", "
from sashelp.vcolumn
where libname='WORK' and memname='HAVE' and
 lowcase(name) like '%~_population' escape '~';

	create table region1_3 as 
	select &var_list1 ,
	sum(sum(&var_list)) as x_total
	from region1;
quit;

View solution in original post


All Replies
Super User
Posts: 17,829

Re: Creating new variables based on part of a variable name

SAS doesn't provide a way to work with variables that have suffixes only prefixes. 

 

Given that it's not 100% clear what you're asking, I'm going to suggest posting sample data and expected output. It's easier than guessing what you're after. 

Contributor
Posts: 29

Re: Creating new variables based on part of a variable name

Hi Reeza,

 

I see that my attachment did not attach.

 

Here is what I have:

 

name ob_population ob_est x_population x_est c_population c_est
1          50       2           70       5         80       7
2         100       3           150      4         100      10
3          55       5           50       10        75       50

what I want is to create a new row that pools this info. So, I want all variables that end in _population to sum. For example, ob_population would now add 50 + 100 + 55 = 205, and I would get a new 'observation' which would be name 4, that contains this. x_population and c_population would also do the same thing.

 

Does this make sense?

Super User
Posts: 10,500

Re: Creating new variables based on part of a variable name

What will you do with the resulting dataset? The behavior you are describing is more typical for a report as summary values inside a dataset with the base values has the potential of being used in such a way the total is more than intended.

See:

proc print data=have noobs;
   sum ob_population x_population c_population;
run;

Depending on what you have done before it may even be possible that the data would make more sense as

 

Type (with values of Ob X C) Population and Est.

Contributor
Posts: 29

Re: Creating new variables based on part of a variable name

Ballard,

 

I have a many population variables - and I want to somewhat automate this process. I know that I could go through and do that, but I would like a much more efficient way to do this. In addition to the _population vars, I have other vars that I will be conducting other calcuations on.

 

Super User
Posts: 17,829

Re: Creating new variables based on part of a variable name

List all vars in an array, chech if name contains population using a combination of find and vname function. Then add up values.

Or parse names based on suffix - was a question in last 24 hours on extracting the variable names.
Contributor
Posts: 29

Re: Creating new variables based on part of a variable name

Thanks, Reeza. Can you expound on how to use the combination of find and vname to do this?

 

 

Contributor
Posts: 29

Re: Creating new variables based on part of a variable name

So far, I have this:

 

data region1_2;
	set region1;
	array test [4:513]  y2--y1000;
	do i=1 to dim(test); 
	if find(i,'_population') then ;
end;
run;

But I'm stuck on how to incorporate vname function and to sum these. Can you point me in the right direction?

Super User
Posts: 17,829

Re: Creating new variables based on part of a variable name

[ Edited ]

You could do it that way but it's inefficient. 

It would be something like:

 

 

EDITtED:

data region1_2;
	set region1;
	array test [4:513]  y2--y1000;
        sum=0;

	do i=1 to dim(test); 
	     if find(vname(test(i)),'_population') then sum= sum+test(i);
        end;
run;

Like I mentioned, it's better to get the names once from the SASHELP.VCOLUMN or the Dictionary table.

 

proc sql noprint;
select name into :var_list separated by ", "
from sashelp.vcolumn
where upper(libname)='WORK' and and upper(me name) ='HAVE' and upper(name) like '%_POPULATION';
quit;

data region1_2;
set region1;
sum_want = sum(&var_list);
run;
Super User
Posts: 9,681

Re: Creating new variables based on part of a variable name

Add some salt into Rezza's code:



data have;
input name ob_population ob_est x_population x_est c_population c_est;
cards;
1          50       2           70       5         80       7
2         100       3           150      4         100      10
3          55       5           50       10        75       50
;
run;
proc sql noprint;
select name into : var_list separated by ", "
from sashelp.vcolumn
where libname='WORK' and memname='HAVE' and
 lowcase(name) like '%~_population' escape '~';
quit;
%put &var_list ;
data region1_2;
set have;
sum_want = sum(&var_list);
run;


Contributor
Posts: 29

Re: Creating new variables based on part of a variable name

Thanks Ksharp and Reeza,

 

This has been very helpful  - especially using the var_list macro. However, I am trying to do something slightly different than the solution you proposed, and I think it is because I did not explain my issue well enough. I am attempting to sum down the columns for each var that ends in _population, and not across all rows that end in _population for each observation. For example, I want to sum down ob_population to get (50+100+55)=205, and create a new var such as ob_population_sum, and continue to do this for x_population, c_population, and so on. 

 

Essentially, I am trying to create a new row that corresponds to the total of a column summed above it. I've tried using something as:

proc sql;
	create table region1_3 as select *,
	sum(&var_list) as x_total
	from region1;
quit;

but I'm still getting a new column that corresponds to summing across rows, not a new row that corresponds to summing down the _population columns. How do I sum down these columns, using the var_list macro? 

 

Thanks.

Super User
Posts: 17,829

Re: Creating new variables based on part of a variable name

Use a PROC MEANS to summarize your data.

 

proc means data=have noprint;
var &var_list;
output out=want sum= /autoname;
run;
Solution
‎05-26-2016 07:30 PM
Super User
Posts: 9,681

Re: Creating new variables based on part of a variable name

data region1;
input name ob_population ob_est x_population x_est c_population c_est;
cards;
1          50       2           70       5         80       7
2         100       3           150      4         100      10
3          55       5           50       10        75       50
;
run;
proc sql noprint;
select name into : var_list separated by ", "
from sashelp.vcolumn
where libname='WORK' and memname='HAVE' and
 lowcase(name) like '%~_population' escape '~';
 
select catx(' ','sum(',name,') as ',cats(name,'_sum')) into : var_list1 separated by ", "
from sashelp.vcolumn
where libname='WORK' and memname='HAVE' and
 lowcase(name) like '%~_population' escape '~';

	create table region1_3 as 
	select &var_list1 ,
	sum(sum(&var_list)) as x_total
	from region1;
quit;
Contributor
Posts: 29

Re: Creating new variables based on part of a variable name

Thanks to both of you!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 497 views
  • 3 likes
  • 4 in conversation