For example I have:
Country | Town | Population | Percentage_of_population_each_town |
UK | A | 10 | |
UK | A | 20 | |
UK | B | 20 | |
US | C | 10 | |
US | C | 20 | |
US | D | 5 | |
Greece | E | 20 | |
Greece | K | 10 |
I want to calculate last column 'Percentage of population each group" - which is population of a town as a percentage of a total population of that country.
I want this:
Country | Town | Population | Percentage_of_population_each_town |
UK | A | 30 | 60% |
UK | B | 20 | 40% |
Total |
| 50 | |
US | C | 30 |
|
US | D | 5 |
|
35 | 35 | ||
Greece | E | 20 | 66% |
Greece | K | 10 | 33% |
Simple query (with remerging) :
data have;
length country town $12;
input country town population;
datalines;
UK A 10
UK A 20
UK B 20
US C 10
US C 20
US D 5
Greece E 20
Greece K 10
;
proc sql;
create table want as
select country, town, population, population/sum(population) as pctPop format=percent8.1
from (select country, town, sum(population) as population from have group by country, town)
group by country;
select * from want;
quit;
PG
Corrected by PG to account for multiple town records.
Simple query (with remerging) :
data have;
length country town $12;
input country town population;
datalines;
UK A 10
UK A 20
UK B 20
US C 10
US C 20
US D 5
Greece E 20
Greece K 10
;
proc sql;
create table want as
select country, town, population, population/sum(population) as pctPop format=percent8.1
from (select country, town, sum(population) as population from have group by country, town)
group by country;
select * from want;
quit;
PG
Corrected by PG to account for multiple town records.
not as short as PG's code:
data have;
input Country $ Town $ Population;
cards;
UK A 10
UK A 20
UK B 20
US C 10
US C 20
US D 5
Greece E 20
Greece K 10
proc sql;
create table temp1 as
select country,sum(population) as t
from have
group by country;
create table temp2 as select country,town,sum(population) as population
from have
group by country,town;
create table want as
select a.country,town,population,population/t as percentage format=percent8.2
from temp2 as a,temp1
where a.country=temp1.country;
quit;
proc print;run;
Linlin
hi ... another idea (as long as all you want are totals and percentages, not output that looks exactly like what you posted) ...
assuming that the country data are grouped already in the data set ...
proc freq data=have;
table town;
by country notsorted;
weight population;
run;
country=UK
Cumulative Cumulative
town Frequency Percent Frequency Percent
A 30 60.00 30 60.00
B 20 40.00 50 100.00
country=US
Cumulative Cumulative
town Frequency Percent Frequency Percent
C 30 85.71 30 85.71
D 5 14.29 35 100.00
country=Greece
Cumulative Cumulative
town Frequency Percent Frequency Percent
E 20 66.67 20 66.67
K 10 33.33 30 100.00
Or,
proc freq data=have noprint;
by country notsorted;
table town / sparse OUT=WANT;
weight population;
run;
proc print; run;
PG
If it's a report output you want you can utilize proc report.
data x;
input Country $ Town $ Population;
datalines;
UK A 10
UK A 20
UK B 20
US C 10
US C 20
US D 5
Greece E 20
Greece K 10
;
run;
proc format;
value newpctfmt . = ' '
other = [percent9.2];
run;
proc report data=x nowd missing out=y;
column Country Town Population Pct;
define Country / group order=data ;
define Town / group ;
define Population / analysis;
define Pct / computed format=newpctfmt.;
compute before Country;
town_total = Population.Sum;
endcomp;
Compute Country;
if Country ne '' then hold_Country = Country;
else Country = hold_Country;
endcomp;
compute after Country;
Country = 'Total';
Pct = .;
endcomp;
break after Country / summarize style=[font_style=Roman];
compute Pct;
Pct = Population.Sum / Town_Total;
endcomp;
run;
If needed, one can get this into a table from proc report's output data set y.
data z;
set y(drop=_BREAK_);
by Country notsorted;
if not first.Country or Country = 'Total';
format Pct newpctfmt.;
run;
Of course Proc SQL, freq or even proc report have the native edge to tackle this problem, while good old data step can still provide an option:
data have;
input country$ town$ population;
cards;
UK A 10
UK A 20
UK B 20
US C 10
US C 20
US D 5
Greece E 20
Greece K 10
;
data want;
do until (last.country);
set have;
by country town notsorted;
_c+population;
end;
do until (last.country);
do until (last.town);
set have;
by country town notsorted;
_t+population;
end;
percent=_t/_c;
output;
call missing(_t);
end;
call missing(_c);
format percent percent 8.2;
drop _c population;
rename _t=population;
run;
proc print;run;
Haikuo
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.