Hi all,
I want to search all of the variables in my dataset for a particular substring at the end of my variable name, and then make calculations to these variables that satisfy this condition and then create a new variable. In my dataset, I have a few hundred variables overall, and many that end in _population. For example, if I have vars di_population and xi_population and ci_population, I want each to sum each of these by the amount of observations I have in my dataset.
I have been trying to use:
data new;
set old;
if findc('_population', _all_) then sum;
run;
but this doesn't seem to the correct way to find the variables that end in _population OR to add these together. Does anyone have any ideas?
Thanks.
data region1;
input name ob_population ob_est x_population x_est c_population c_est;
cards;
1 50 2 70 5 80 7
2 100 3 150 4 100 10
3 55 5 50 10 75 50
;
run;
proc sql noprint;
select name into : var_list separated by ", "
from sashelp.vcolumn
where libname='WORK' and memname='HAVE' and
lowcase(name) like '%~_population' escape '~';
select catx(' ','sum(',name,') as ',cats(name,'_sum')) into : var_list1 separated by ", "
from sashelp.vcolumn
where libname='WORK' and memname='HAVE' and
lowcase(name) like '%~_population' escape '~';
create table region1_3 as
select &var_list1 ,
sum(sum(&var_list)) as x_total
from region1;
quit;
SAS doesn't provide a way to work with variables that have suffixes only prefixes.
Given that it's not 100% clear what you're asking, I'm going to suggest posting sample data and expected output. It's easier than guessing what you're after.
Hi Reeza,
I see that my attachment did not attach.
Here is what I have:
name ob_population ob_est x_population x_est c_population c_est
1 50 2 70 5 80 7
2 100 3 150 4 100 10
3 55 5 50 10 75 50
what I want is to create a new row that pools this info. So, I want all variables that end in _population to sum. For example, ob_population would now add 50 + 100 + 55 = 205, and I would get a new 'observation' which would be name 4, that contains this. x_population and c_population would also do the same thing.
Does this make sense?
What will you do with the resulting dataset? The behavior you are describing is more typical for a report as summary values inside a dataset with the base values has the potential of being used in such a way the total is more than intended.
See:
proc print data=have noobs;
sum ob_population x_population c_population;
run;
Depending on what you have done before it may even be possible that the data would make more sense as
Type (with values of Ob X C) Population and Est.
Ballard,
I have a many population variables - and I want to somewhat automate this process. I know that I could go through and do that, but I would like a much more efficient way to do this. In addition to the _population vars, I have other vars that I will be conducting other calcuations on.
Thanks, Reeza. Can you expound on how to use the combination of find and vname to do this?
So far, I have this:
data region1_2;
set region1;
array test [4:513] y2--y1000;
do i=1 to dim(test);
if find(i,'_population') then ;
end;
run;
But I'm stuck on how to incorporate vname function and to sum these. Can you point me in the right direction?
You could do it that way but it's inefficient.
It would be something like:
EDITtED:
data region1_2;
set region1;
array test [4:513] y2--y1000;
sum=0;
do i=1 to dim(test);
if find(vname(test(i)),'_population') then sum= sum+test(i);
end;
run;
Like I mentioned, it's better to get the names once from the SASHELP.VCOLUMN or the Dictionary table.
proc sql noprint;
select name into :var_list separated by ", "
from sashelp.vcolumn
where upper(libname)='WORK' and and upper(me name) ='HAVE' and upper(name) like '%_POPULATION';
quit;
data region1_2;
set region1;
sum_want = sum(&var_list);
run;
Add some salt into Rezza's code: data have; input name ob_population ob_est x_population x_est c_population c_est; cards; 1 50 2 70 5 80 7 2 100 3 150 4 100 10 3 55 5 50 10 75 50 ; run; proc sql noprint; select name into : var_list separated by ", " from sashelp.vcolumn where libname='WORK' and memname='HAVE' and lowcase(name) like '%~_population' escape '~'; quit; %put &var_list ; data region1_2; set have; sum_want = sum(&var_list); run;
Thanks Ksharp and Reeza,
This has been very helpful - especially using the var_list macro. However, I am trying to do something slightly different than the solution you proposed, and I think it is because I did not explain my issue well enough. I am attempting to sum down the columns for each var that ends in _population, and not across all rows that end in _population for each observation. For example, I want to sum down ob_population to get (50+100+55)=205, and create a new var such as ob_population_sum, and continue to do this for x_population, c_population, and so on.
Essentially, I am trying to create a new row that corresponds to the total of a column summed above it. I've tried using something as:
proc sql;
create table region1_3 as select *,
sum(&var_list) as x_total
from region1;
quit;
but I'm still getting a new column that corresponds to summing across rows, not a new row that corresponds to summing down the _population columns. How do I sum down these columns, using the var_list macro?
Thanks.
Use a PROC MEANS to summarize your data.
proc means data=have noprint;
var &var_list;
output out=want sum= /autoname;
run;
data region1;
input name ob_population ob_est x_population x_est c_population c_est;
cards;
1 50 2 70 5 80 7
2 100 3 150 4 100 10
3 55 5 50 10 75 50
;
run;
proc sql noprint;
select name into : var_list separated by ", "
from sashelp.vcolumn
where libname='WORK' and memname='HAVE' and
lowcase(name) like '%~_population' escape '~';
select catx(' ','sum(',name,') as ',cats(name,'_sum')) into : var_list1 separated by ", "
from sashelp.vcolumn
where libname='WORK' and memname='HAVE' and
lowcase(name) like '%~_population' escape '~';
create table region1_3 as
select &var_list1 ,
sum(sum(&var_list)) as x_total
from region1;
quit;
Thanks to both of you!
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.