Hi forum,
I want to summarize the Account1-AccountN over all regions and per year (2019, 2020).
The population I want to summarize per year, but only one value per Region (for example 2019: 82320980+50879514+43025905). Because otherwise the population size would be too high (the population size of the different regions can be found multiply in the dataset).
Data I have:
year | Region | population | Account1 | Account2 | AccountN |
2019 | RegionA | 82320980 | . | 8 | 7 |
2019 | RegionA | 82320980 | 4 | . | 8 |
2019 | RegionB | 50879514 | . | 6 | . |
2019 | RegionB | 50879514 | 2 | . | . |
2019 | RegionC | 43025905 | 4 | . | 5 |
2019 | RegionC | 43025905 | . | 2 | . |
2020 | RegionA | 82320520 | 5 | 4 | . |
2020 | RegionA | 82320520 | 1 | . | 8 |
2020 | RegionB | 50879320 | . | 3 | 5 |
2020 | RegionB | 50879320 | 4 | 4 | . |
2020 | RegionC | 43025782 | 7 | . | 6 |
2020 | RegionC | 43025782 | 5 | 9 | . |
This is the code for the dataset:
data have;
input year $ Region $ population Account1 Account2 AccountN;
datalines;
2019 RegionA 82320980 . 8 7
2019 RegionA 82320980 4 . 8
2019 RegionB 50879514 . 6 .
2019 RegionB 50879514 2 . .
2019 RegionC 43025905 4 . 5
2019 RegionC 43025905 . 2 .
2020 RegionA 82320520 5 4 .
2020 RegionA 82320520 1 . 8
2020 RegionB 50879320 . 3 5
2020 RegionB 50879320 4 4 .
2020 RegionC 43025782 7 . 6
2020 RegionC 43025782 5 9 .
;This is what I want:
year | population | Account1 | Account2 | AccountN |
2019 | 176226399 | 10 | 16 | 20 |
2020 | 176225622 | 22 | 20 | 19 |
I tried it with this code...
proc summary data=have nway;
class year;
id population;
var account1 account2 accountN;
output
out=want (drop=_freq_ _type_)
sum()=
;...but It doesnt work. It summarizes the account-values per year, but not the population. Where is the problem of my attemption? How can I modify the summary-command? Thank you!
That's the problems you get with redundant values.
You need a two-step approach: first summarize by year and region, and take the max of population, then summarize by year:
proc summary data=have nway;
class year region;
var population account:;
output
out=inter
max(population)=
sum(acc:)=
;
run;
proc summary data=inter nway;
class year;
var population acc:;
output
out=want (drop=_:)
sum()=
;
run;
That's the problems you get with redundant values.
You need a two-step approach: first summarize by year and region, and take the max of population, then summarize by year:
proc summary data=have nway;
class year region;
var population account:;
output
out=inter
max(population)=
sum(acc:)=
;
run;
proc summary data=inter nway;
class year;
var population acc:;
output
out=want (drop=_:)
sum()=
;
run;
Thank you, its working. may I ask:
What does that mean? Why an equalsign with no value behind it?
max(population)=
sum(acc:)=
Look under
here:
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.