BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Konkordanz
Pyrite | Level 9

Hi forum,

 

I want to summarize the Account1-AccountN over all regions and per year (2019, 2020).

The population I want to summarize per year, but only one value per Region (for example 2019: 82320980+50879514+43025905). Because otherwise the population size would be too high (the population size of the different regions can be found multiply in the dataset).

 

Data I have:

 

year

Region

population

Account1

Account2

AccountN

2019

RegionA

82320980

.

8

7

2019

RegionA

82320980

4

.

8

2019

RegionB

50879514

.

6

.

2019

RegionB

50879514

2

.

.

2019

RegionC

43025905

4

.

5

2019

RegionC

43025905

.

2

.

2020

RegionA

82320520

5

4

.

2020

RegionA

82320520

1

.

8

2020

RegionB

50879320

.

3

5

2020

RegionB

50879320

4

4

.

2020

RegionC

43025782

7

.

6

2020

RegionC

43025782

5

9

.

 

This is the code for the dataset:

data have;
input year $ Region $ population Account1 Account2 AccountN;
datalines;
2019 RegionA 82320980 . 8 7
2019 RegionA 82320980 4 . 8
2019 RegionB 50879514 . 6 .
2019 RegionB 50879514 2 . .
2019 RegionC 43025905 4 . 5
2019 RegionC 43025905 . 2 .
2020 RegionA 82320520 5 4 .
2020 RegionA 82320520 1 . 8
2020 RegionB 50879320 . 3 5
2020 RegionB 50879320 4 4 .
2020 RegionC 43025782 7 . 6
2020 RegionC 43025782 5 9 .
;

This is what I want:

 

year

population

Account1

Account2

AccountN

2019

176226399

10

16

20

2020

176225622

22

20

19

 

I tried it with this code...

proc summary data=have nway;
class year;
id population;
var account1 account2 accountN;
output
out=want (drop=_freq_ _type_)
sum()=
;

...but It doesnt work. It summarizes the account-values per year, but not the population. Where is the problem of my attemption? How can I modify the summary-command? Thank you!

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

That's the problems you get with redundant values.

You need a two-step approach: first summarize by year and region, and take the max of population, then summarize by year:

proc summary data=have nway;
class year region;
var population account:;
output
  out=inter
  max(population)=
  sum(acc:)=
;
run;

proc summary data=inter nway;
class year;
var population acc:;
output
  out=want (drop=_:)
  sum()=
;
run;

View solution in original post

3 REPLIES 3
Kurt_Bremser
Super User

That's the problems you get with redundant values.

You need a two-step approach: first summarize by year and region, and take the max of population, then summarize by year:

proc summary data=have nway;
class year region;
var population account:;
output
  out=inter
  max(population)=
  sum(acc:)=
;
run;

proc summary data=inter nway;
class year;
var population acc:;
output
  out=want (drop=_:)
  sum()=
;
run;
Konkordanz
Pyrite | Level 9

Thank you, its working. may I ask:

What does that mean? Why an equalsign with no value behind it?

max(population)=
sum(acc:)=

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 911 views
  • 2 likes
  • 2 in conversation