BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Konkordanz
Pyrite | Level 9

Hi,

 

in my dataset I´ve some data of several regions of the world. Each region is found multiple times in my dataset. Column A is the region, column B contents the values I want to summarize and column C includes the population of the respective region. The population also occur several times, but its always the same. This one should not be summarized.

 

Data i Have:

 

 ValuePopulation
Region A582 321 556
Region B650 879 514
Region C79 054 876
Region B450 879 514
Region C69 054 876
Region A882 321 556
Region B450 879 514
Region C19 054 876
Region B350 879 514
Region A982 321 556
Region C49 054 876
Region A282 321 556

 

Data I want:

 ValuePopulation
Region A2482 321 556
Region B1750 879 514
Region C189 054 876

 

Do you know a shot way to do this? Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

If population is reliably the same for all observations within a region, then you can do this in PROC SUMMARY:

proc summary data=have nway;
class region population;
var value;
output
  out=want (drop=_freq_ _type_)
  sum()=
;
run;

But why do you have a redundant value in your dataset in the first place?

View solution in original post

4 REPLIES 4
andreas_lds
Jade | Level 19

Some options:

  • sort by region, then use a data step to calculate the sum. first /  last will be necessary, "first" to reset the sum, "last" to output the observation.
  • use proc summary to calculate the sum, and maybe id-statement to preserve "population"

If you want code, post data in usable form.

Kurt_Bremser
Super User

If population is reliably the same for all observations within a region, then you can do this in PROC SUMMARY:

proc summary data=have nway;
class region population;
var value;
output
  out=want (drop=_freq_ _type_)
  sum()=
;
run;

But why do you have a redundant value in your dataset in the first place?

Konkordanz
Pyrite | Level 9

Thanks for your answers. Sorry, but I should specify my concern:

Each year (2019, 2020) has several regions with specific population-size. The population should be summarized per year, but only one time per region (transfer 1x into the line of the year, otherwise the population in general would be too big).

The values of Account1-AccountN should be summarized per year over all regions.

 

Data I have:

YearRegionPopulationAccount1Account2AccountN
2019RegionA82320980.87
2019RegionA823209804.8
2019RegionB50879514.6.
2019RegionB508795142..
2019RegionC430259054.5
2019RegionC43025905.2.
2020RegionA8232052054.
2020RegionA823205201.8
2020RegionB50879320.35
2020RegionB5087932044.
2020RegionC430257827.6
2020RegionC4302578259.

 

Here is the Code for the data:

 

data have;
input year $ Region $ population Account1 Account2 AccountN;
datalines;
2019 RegionA 82320980 . 8 7
2019 RegionA 82320980 4 . 8
2019 RegionB 50879514 . 6 .
2019 RegionB 50879514 2 . .
2019 RegionC 43025905 4 . 5
2019 RegionC 43025905 . 2 .
2020 RegionA 82320520 5 4 .
2020 RegionA 82320520 1 . 8
2020 RegionB 50879320 . 3 5
2020 RegionB 50879320 4 4 .
2020 RegionC 43025782 7 . 6
2020 RegionC 43025782 5 9 .
;
run;

 

 

 

What I want is that:

YearPopulationAccount1Account2AccountN
2019176 226 399101620
2020176 225 622222019

 

Sorry for beeing concrete that late :)! Thank you for helping me.

GraphGuy
Meteorite | Level 14

Or, using SQL ...

 

data foo;
input region $ 1-8 value population;
datalines;
Region A	5	82321556
Region B	6	50879514
Region C	7	9054876
Region B	4	50879514
Region C	6	9054876
Region A	8	82321556
Region B	4	50879514
Region C	1	9054876
Region B	3	50879514
Region A	9	82321556
Region C	4	9054876
Region A	2	82321556
;
run;

proc sql;
create table summarized_data as
select unique region, population, sum(value) as sum_value
from foo
group by region, population;
quit; run;

proc print data=summarized_data; run;

sum_table.png

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1239 views
  • 0 likes
  • 4 in conversation