BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Konkordanz
Pyrite | Level 9

Hi forum,

 

I want to summarize the Account1-AccountN over all regions and per year (2019, 2020).

The population I want to summarize per year, but only one value per Region (for example 2019: 82320980+50879514+43025905). Because otherwise the population size would be too high (the population size of the different regions can be found multiply in the dataset).

 

Data I have:

 

year

Region

population

Account1

Account2

AccountN

2019

RegionA

82320980

.

8

7

2019

RegionA

82320980

4

.

8

2019

RegionB

50879514

.

6

.

2019

RegionB

50879514

2

.

.

2019

RegionC

43025905

4

.

5

2019

RegionC

43025905

.

2

.

2020

RegionA

82320520

5

4

.

2020

RegionA

82320520

1

.

8

2020

RegionB

50879320

.

3

5

2020

RegionB

50879320

4

4

.

2020

RegionC

43025782

7

.

6

2020

RegionC

43025782

5

9

.

 

This is the code for the dataset:

data have;
input year $ Region $ population Account1 Account2 AccountN;
datalines;
2019 RegionA 82320980 . 8 7
2019 RegionA 82320980 4 . 8
2019 RegionB 50879514 . 6 .
2019 RegionB 50879514 2 . .
2019 RegionC 43025905 4 . 5
2019 RegionC 43025905 . 2 .
2020 RegionA 82320520 5 4 .
2020 RegionA 82320520 1 . 8
2020 RegionB 50879320 . 3 5
2020 RegionB 50879320 4 4 .
2020 RegionC 43025782 7 . 6
2020 RegionC 43025782 5 9 .
;

This is what I want:

 

year

population

Account1

Account2

AccountN

2019

176226399

10

16

20

2020

176225622

22

20

19

 

I tried it with this code...

proc summary data=have nway;
class year;
id population;
var account1 account2 accountN;
output
out=want (drop=_freq_ _type_)
sum()=
;

...but It doesnt work. It summarizes the account-values per year, but not the population. Where is the problem of my attemption? How can I modify the summary-command? Thank you!

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

That's the problems you get with redundant values.

You need a two-step approach: first summarize by year and region, and take the max of population, then summarize by year:

proc summary data=have nway;
class year region;
var population account:;
output
  out=inter
  max(population)=
  sum(acc:)=
;
run;

proc summary data=inter nway;
class year;
var population acc:;
output
  out=want (drop=_:)
  sum()=
;
run;

View solution in original post

3 REPLIES 3
Kurt_Bremser
Super User

That's the problems you get with redundant values.

You need a two-step approach: first summarize by year and region, and take the max of population, then summarize by year:

proc summary data=have nway;
class year region;
var population account:;
output
  out=inter
  max(population)=
  sum(acc:)=
;
run;

proc summary data=inter nway;
class year;
var population acc:;
output
  out=want (drop=_:)
  sum()=
;
run;
Konkordanz
Pyrite | Level 9

Thank you, its working. may I ask:

What does that mean? Why an equalsign with no value behind it?

max(population)=
sum(acc:)=

 

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 978 views
  • 2 likes
  • 2 in conversation