Hi forum,
I want to summarize the Account1-AccountN over all regions and per year (2019, 2020).
The population I want to summarize per year, but only one value per Region (for example 2019: 82320980+50879514+43025905). Because otherwise the population size would be too high (the population size of the different regions can be found multiply in the dataset).
Data I have:
year | Region | population | Account1 | Account2 | AccountN |
2019 | RegionA | 82320980 | . | 8 | 7 |
2019 | RegionA | 82320980 | 4 | . | 8 |
2019 | RegionB | 50879514 | . | 6 | . |
2019 | RegionB | 50879514 | 2 | . | . |
2019 | RegionC | 43025905 | 4 | . | 5 |
2019 | RegionC | 43025905 | . | 2 | . |
2020 | RegionA | 82320520 | 5 | 4 | . |
2020 | RegionA | 82320520 | 1 | . | 8 |
2020 | RegionB | 50879320 | . | 3 | 5 |
2020 | RegionB | 50879320 | 4 | 4 | . |
2020 | RegionC | 43025782 | 7 | . | 6 |
2020 | RegionC | 43025782 | 5 | 9 | . |
This is the code for the dataset:
data have;
input year $ Region $ population Account1 Account2 AccountN;
datalines;
2019 RegionA 82320980 . 8 7
2019 RegionA 82320980 4 . 8
2019 RegionB 50879514 . 6 .
2019 RegionB 50879514 2 . .
2019 RegionC 43025905 4 . 5
2019 RegionC 43025905 . 2 .
2020 RegionA 82320520 5 4 .
2020 RegionA 82320520 1 . 8
2020 RegionB 50879320 . 3 5
2020 RegionB 50879320 4 4 .
2020 RegionC 43025782 7 . 6
2020 RegionC 43025782 5 9 .
;
This is what I want:
year | population | Account1 | Account2 | AccountN |
2019 | 176226399 | 10 | 16 | 20 |
2020 | 176225622 | 22 | 20 | 19 |
I tried it with this code...
proc summary data=have nway;
class year;
id population;
var account1 account2 accountN;
output
out=want (drop=_freq_ _type_)
sum()=
;
...but It doesnt work. It summarizes the account-values per year, but not the population. Where is the problem of my attemption? How can I modify the summary-command? Thank you!
That's the problems you get with redundant values.
You need a two-step approach: first summarize by year and region, and take the max of population, then summarize by year:
proc summary data=have nway;
class year region;
var population account:;
output
out=inter
max(population)=
sum(acc:)=
;
run;
proc summary data=inter nway;
class year;
var population acc:;
output
out=want (drop=_:)
sum()=
;
run;
That's the problems you get with redundant values.
You need a two-step approach: first summarize by year and region, and take the max of population, then summarize by year:
proc summary data=have nway;
class year region;
var population account:;
output
out=inter
max(population)=
sum(acc:)=
;
run;
proc summary data=inter nway;
class year;
var population acc:;
output
out=want (drop=_:)
sum()=
;
run;
Thank you, its working. may I ask:
What does that mean? Why an equalsign with no value behind it?
max(population)=
sum(acc:)=
Look under
here:
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.