BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kevinmc87
Obsidian | Level 7

Hi all,

 

I want to search all of the variables in my dataset for a particular substring at the end of my variable name, and then make calculations to these variables that satisfy this condition and then create a new variable. In my dataset, I have a few hundred variables overall, and many that end in _population. For example, if I have vars di_population and xi_population and ci_population, I want each to sum each of these by the amount of observations I have in my dataset. 

I have been trying to use:

data new;
set old;
if findc('_population', _all_) then sum;
run;


but this doesn't seem to the correct way to find the variables that end in _population OR to add these together. Does anyone have any ideas?

 

Thanks.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data region1;
input name ob_population ob_est x_population x_est c_population c_est;
cards;
1          50       2           70       5         80       7
2         100       3           150      4         100      10
3          55       5           50       10        75       50
;
run;
proc sql noprint;
select name into : var_list separated by ", "
from sashelp.vcolumn
where libname='WORK' and memname='HAVE' and
 lowcase(name) like '%~_population' escape '~';
 
select catx(' ','sum(',name,') as ',cats(name,'_sum')) into : var_list1 separated by ", "
from sashelp.vcolumn
where libname='WORK' and memname='HAVE' and
 lowcase(name) like '%~_population' escape '~';

	create table region1_3 as 
	select &var_list1 ,
	sum(sum(&var_list)) as x_total
	from region1;
quit;

View solution in original post

13 REPLIES 13
Reeza
Super User

SAS doesn't provide a way to work with variables that have suffixes only prefixes. 

 

Given that it's not 100% clear what you're asking, I'm going to suggest posting sample data and expected output. It's easier than guessing what you're after. 

kevinmc87
Obsidian | Level 7

Hi Reeza,

 

I see that my attachment did not attach.

 

Here is what I have:

 

name ob_population ob_est x_population x_est c_population c_est
1          50       2           70       5         80       7
2         100       3           150      4         100      10
3          55       5           50       10        75       50

what I want is to create a new row that pools this info. So, I want all variables that end in _population to sum. For example, ob_population would now add 50 + 100 + 55 = 205, and I would get a new 'observation' which would be name 4, that contains this. x_population and c_population would also do the same thing.

 

Does this make sense?

ballardw
Super User

What will you do with the resulting dataset? The behavior you are describing is more typical for a report as summary values inside a dataset with the base values has the potential of being used in such a way the total is more than intended.

See:

proc print data=have noobs;
   sum ob_population x_population c_population;
run;

Depending on what you have done before it may even be possible that the data would make more sense as

 

Type (with values of Ob X C) Population and Est.

kevinmc87
Obsidian | Level 7

Ballard,

 

I have a many population variables - and I want to somewhat automate this process. I know that I could go through and do that, but I would like a much more efficient way to do this. In addition to the _population vars, I have other vars that I will be conducting other calcuations on.

 

Reeza
Super User
List all vars in an array, chech if name contains population using a combination of find and vname function. Then add up values.

Or parse names based on suffix - was a question in last 24 hours on extracting the variable names.
kevinmc87
Obsidian | Level 7

Thanks, Reeza. Can you expound on how to use the combination of find and vname to do this?

 

 

kevinmc87
Obsidian | Level 7

So far, I have this:

 

data region1_2;
	set region1;
	array test [4:513]  y2--y1000;
	do i=1 to dim(test); 
	if find(i,'_population') then ;
end;
run;

But I'm stuck on how to incorporate vname function and to sum these. Can you point me in the right direction?

Reeza
Super User

You could do it that way but it's inefficient. 

It would be something like:

 

 

EDITtED:

data region1_2;
	set region1;
	array test [4:513]  y2--y1000;
        sum=0;

	do i=1 to dim(test); 
	     if find(vname(test(i)),'_population') then sum= sum+test(i);
        end;
run;

Like I mentioned, it's better to get the names once from the SASHELP.VCOLUMN or the Dictionary table.

 

proc sql noprint;
select name into :var_list separated by ", "
from sashelp.vcolumn
where upper(libname)='WORK' and and upper(me name) ='HAVE' and upper(name) like '%_POPULATION';
quit;

data region1_2;
set region1;
sum_want = sum(&var_list);
run;
Ksharp
Super User
Add some salt into Rezza's code:



data have;
input name ob_population ob_est x_population x_est c_population c_est;
cards;
1          50       2           70       5         80       7
2         100       3           150      4         100      10
3          55       5           50       10        75       50
;
run;
proc sql noprint;
select name into : var_list separated by ", "
from sashelp.vcolumn
where libname='WORK' and memname='HAVE' and
 lowcase(name) like '%~_population' escape '~';
quit;
%put &var_list ;
data region1_2;
set have;
sum_want = sum(&var_list);
run;


kevinmc87
Obsidian | Level 7

Thanks Ksharp and Reeza,

 

This has been very helpful  - especially using the var_list macro. However, I am trying to do something slightly different than the solution you proposed, and I think it is because I did not explain my issue well enough. I am attempting to sum down the columns for each var that ends in _population, and not across all rows that end in _population for each observation. For example, I want to sum down ob_population to get (50+100+55)=205, and create a new var such as ob_population_sum, and continue to do this for x_population, c_population, and so on. 

 

Essentially, I am trying to create a new row that corresponds to the total of a column summed above it. I've tried using something as:

proc sql;
	create table region1_3 as select *,
	sum(&var_list) as x_total
	from region1;
quit;

but I'm still getting a new column that corresponds to summing across rows, not a new row that corresponds to summing down the _population columns. How do I sum down these columns, using the var_list macro? 

 

Thanks.

Reeza
Super User

Use a PROC MEANS to summarize your data.

 

proc means data=have noprint;
var &var_list;
output out=want sum= /autoname;
run;
Ksharp
Super User
data region1;
input name ob_population ob_est x_population x_est c_population c_est;
cards;
1          50       2           70       5         80       7
2         100       3           150      4         100      10
3          55       5           50       10        75       50
;
run;
proc sql noprint;
select name into : var_list separated by ", "
from sashelp.vcolumn
where libname='WORK' and memname='HAVE' and
 lowcase(name) like '%~_population' escape '~';
 
select catx(' ','sum(',name,') as ',cats(name,'_sum')) into : var_list1 separated by ", "
from sashelp.vcolumn
where libname='WORK' and memname='HAVE' and
 lowcase(name) like '%~_population' escape '~';

	create table region1_3 as 
	select &var_list1 ,
	sum(sum(&var_list)) as x_total
	from region1;
quit;
kevinmc87
Obsidian | Level 7

Thanks to both of you!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 3284 views
  • 3 likes
  • 4 in conversation